*program define process

drop _all
set more 1
version 10

* this is the log file with the results of summ, corr, etc.

  set logtype text
  global mydate = subinstr("${S_DATE}"," ","-",.)
  capture log close
  quietly log using logfiles/process_output_$mydate, replace
  log close

  log using logfiles/process, replace
  
*
* process.do  Process the PSID variables of interest and store.
*
* Written by Karen Dynan, October 2006
* Revised by John Soroushian and Karen Dynan, 2011 & 2012
*
* Argument:
*
*    1 = last year to process variables.
* 
* Get programs used repeatedly set up to be used.

  *  samp determines the sample and has 1 argument
  *  1 = keep all but the Latino sample (new baseline)
  *  2 = keep just the SRC cross-section (old baseline)

     capture program drop samp      
     run samp

  * lagmac shifts a series backward a year.  We use it for variables
  * like the income variable where a given wave's value corresponds
  * to the previous year. For example, we want the income variables
  * in the 2009 wave associated with 2008.  Three arguments:
  * 1 = input variable, 2 = output variable, 3 = last year of input
  * variable (4 digits).

    capture program drop lagmac    
    run lagmac

  * addcpi adds the cpi for deflating purposes.  The series used here
  * is from Haver:  "All Urban Consumers (CPI-U):  All Items"

    capture program drop addcpi 
    run addcpi

  * merge_in merges data sets.  In Stata 12, these commands can all be
  * replaced with a single line,  
  *   merge 1:1 persid using raw_w_datasets/xx_w, nogen noreport
  * but problems with the Brookings system mean we can't use Stata 12 right
  * now.  What a pain.

    capture program drop merge_in
    run merge_in 
 
  * annamt translates "amount" variables that have a corresponding "per" variable
  * into annual amounts.  See the program itself for more info.
  
    capture program drop annamt
    run annamt
	
  * process_out opens a log file and puts summs and corrs into it
    capture program drop process_out
	run process_out

* Identify samples using the 1968 ID
*
*  1 - 2,930:      Member of, or moved into, a family from the 1968 SRC
*                  cross-section sample
*  3,001 - 3,511:  Member of, or moved into, a family from the Immigrant
*                  sample added in 1997 and 1999. Values of 3001-3441
*                  indicate families first interviewed in 1997; values of
*                  3442-3511 indicate families not interviewed until 1999.
*  3,442-3,511:    Member of, or moved into, a family from the 1968 Census
*                  sample
*  7,001-9,308:    Member of, or moved into, a family from the Latino sample
*                  added in 1990 and 1992. Values of 7001-9043 indicate
*                  families first interviewed in 1990; values of 9044-9308
*                  indicate families not interviewed until 1992.

* Identify the Latino sample 

  use raw_w_datasets/id_w
  gen byte lat = (id68 >= 7001) 

  log close
  quietly log using logfiles/process_output_$mydate, append
  table lat, c(n id68 mean id68 sd id68 min id68 max id68)
  log close
  quietly log using logfiles/process, append
  
  keep persid lat
  sort persid
  save pro_w_datasets/lat_w, replace

* Identify the cross-section sample

  use raw_w_datasets/id_w
  gen byte cross = (id68 <= 3000)
  log close
  quietly log using logfiles/process_output_$mydate, append
  table cross, c(n id68 mean id68 sd id68 min id68 max id68)
  log close
  quietly log using logfiles/process, append
  keep persid cross
  sort persid
  save pro_w_datasets/cross_w, replace

* Create a variable showing observations that are "duplicates".  I used
* to do something more complicated than this but now I think all that
* needs to be done is to flag non-heads (and set equal to missing if
* that person wasn't in any household in a given year).

  use raw_w_datasets/head_w
  merge_in id raw
  forvalues yyyy = 1968/2011 {
    local yy = substr(string(`yyyy'), 3,2)      
    capture gen byte dupl`yy' = (head`yy'==0) if id`yy'~=0
  }
   
  keep dupl* persid
  samp 1
  process_out dupl
  save pro_w_datasets/dupl_w, replace

* Create variables that will indicate whether the record is for
* someone that was a wife in various previous years.

  forvalues nnn = 1/5 {

  use raw_w_datasets/wifeid_w

  forvalues yyyy = 1968/2009 {
    local yy = substr(string(`yyyy'), 3,2)   
    scalar ltemptemp = string(`yyyy' - `nnn')
    local lyy = substr(ltemptemp, 3, 2)
    capture gen byte wifem`nnn'`yy' = (persid == wifeid`lyy')
  }
   
  keep wifem`nnn'* persid
  samp 1
  process_out wifem
  save pro_w_datasets/wifem`nnn'_w, replace
  }

* Make a single weight series.
* These weights should apply to the full non-Latino sample.
*
* Through 1992, we use the "Core Sample Family Weight," which is designed to be used for the core (non-Latino) sample.
* For 1993-96, we use the "Core Family Longitudinal Weight." As of December 2011, these weights weren't very findable
*              on the PSID website---you had to look at each individual family dataset, either in the drop-down list 
*              or in the PDF dcoumentation.
* For 1997- on, we use a weight designed for blended core/immigrant sample.  The immigrant sample was added in 1997,
*              and we'll want to use it for at least some purposes.  There seems to be a "core only" weight for some
*              of these years but not all of them, particularly recently, so I don't use it.  For some years, these
*              weights are hard to find (need to look in the individual family datasets.  

  use raw_w_datasets/wgt_c_w
  merge_in wgt_c_l raw
  merge_in wgt_c_i raw

  forvalues yyyy = 1968/2009 {
    if `yyyy'==1998|`yyyy'==2000|`yyyy'==2002|`yyyy'==2004|`yyyy'==2006|`yyyy'==2008|`yyyy'==2010 {
      display "No PSID in `yyyy'."
      continue
    } 
    local yy = substr(string(`yyyy'), 3,2)
	if `yyyy' <= 1992 {
	  quietly gen wgt`yy' = wgt_c`yy'  
	  }
	else if `yyyy' <= 1996 {
	  quietly gen wgt`yy' = wgt_c_l`yy'
	  }
	else if `yyyy' <= 2009 {
   	  quietly gen wgt`yy' = wgt_c_i`yy'
	  }
    }

  * The documentation says to use the 2009 weight for 2011

    gen wgt11 = wgt09      

  * Check the data and save 
  
    drop wgt_c*
    keep persid wgt*
    samp 1
    process_out wgt
    save pro_w_datasets/wgt_w, replace

* Create a dummy indicating a change in household head or spouse.

  use raw_w_datasets/headid_w
  merge_in wifeid raw

  capture gen oldhead = headid68
  capture gen oldwife = wifeid68

  forvalues yyyy = 1969/2009 {
    if `yyyy'==1998|`yyyy'==2000|`yyyy'==2002|`yyyy'==2004|`yyyy'==2006|`yyyy'==2008|`yyyy'==2010 {
      display "No PSID in `yyyy'."
      continue
    }
	local yy = substr(string(`yyyy'), 3,2)
	capture gen byte chhdsp`yy' = 1
    capture replace chhdsp`yy' = 0 if (headid`yy'== oldhead) & (wifeid`yy'== oldwife)
    capture replace oldhead = headid`yy'
    capture replace oldwife = wifeid`yy'
  }
   
  keep chhdsp* persid
  samp 1
  process_out chhdsp
  save pro_w_datasets/chhdsp_w, replace 

* Create flag indicating whether family had positive farm receipts

  use raw_w_datasets/frmrec_w
  merge_in id raw
  forvalues yyyy = 1968/2009 {
    if `yyyy' == 1998 | `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
      display "No PSID in `yyyy'."
      continue
    }
	local yy = substr(string(`yyyy'), 3,2)
	capture gen farm`yy' = 0 if id`yy' ~= 0 & id`yy' ~= . 
    capture replace farm`yy' = 1 if (frmrec`yy' > 0) & (frmrec`yy' ~= .)
  }

  save tempdat, replace
  keep farm* persid
  samp 1
  process_out farm
  save pro_w_datasets/farm_w, replace 

* Create flag indicating whether family had interest in business

  use raw_w_datasets/ownbus_w
  merge_in id raw

  forvalues yyyy = 1968/2011 {
    if `yyyy'==1998|`yyyy'==2000|`yyyy'==2002|`yyyy'==2004|`yyyy'==2006|`yyyy'==2008|`yyyy'==2010 {
      display "No PSID in `yyyy'."
      continue
    }
	local yy = substr(string(`yyyy'), 3,2)
	capture gen bus`yy' = 0 if id`yy' ~= 0 & id`yy' ~= . 
    capture replace bus`yy' = 1 if (ownbus`yy' >= 5) & (ownbus`yy' ~= .)
  }

  save tempdat, replace
  keep bus* persid
  samp 1
  process_out bus
  save pro_w_datasets/bus_w, replace   

* Create education variables
* Notes:
*  1 - in 2009, they re-asked for everyone, which leads to a drop in the correlation 2007-2009
*      as compared with earlier years.
*  2 - as of December 2011, "less than high school" defined as people who explicitly report having neither
*      a high school degree nor a GED in the high school question.  People who report DK and people
*      who have something different (i.e. they were educated in another country) don't show up 
*      with a 1 for any of the education dummies (previously they were lumped with less than hs).

    use raw_w_datasets/id_w
    merge_in hgrade raw
    merge_in hhsdeg raw
    merge_in hcolldeg raw

  * Between 1993 and 2003 (except for 1997) dataset only has a new coll/hs deg reading if different head.

    replace hhsdeg94 = hhsdeg93 if hhsdeg94==0 & id94~=.
    replace hhsdeg95 = hhsdeg94 if hhsdeg95==. & id95~=.
    replace hhsdeg96 = hhsdeg95 if hhsdeg96==. & id96~=.
    replace hhsdeg97 = hhsdeg96 if hhsdeg97==0 & id97~=.
    replace hhsdeg99 = hhsdeg97 if hhsdeg99==. & id99~=.
    replace hhsdeg01 = hhsdeg99 if hhsdeg01==. & id01~=.
    replace hhsdeg03 = hhsdeg01 if hhsdeg03==. & id03~=.

    replace hcolldeg94 = hcolldeg93 if hcolldeg94==0 & id94~=.
    replace hcolldeg95 = hcolldeg94 if hcolldeg95==. & id95~=.
    replace hcolldeg96 = hcolldeg95 if hcolldeg96==. & id96~=.
    replace hcolldeg97 = hcolldeg96 if hcolldeg97==0 & id97~=.
    replace hcolldeg99 = hcolldeg97 if hcolldeg99==. & id99~=.
    replace hcolldeg01 = hcolldeg99 if hcolldeg01==. & id01~=.
    replace hcolldeg03 = hcolldeg01 if hcolldeg03==. & id03~=.

  forvalues yyyy = 1968/2009 {
    if `yyyy' == 1998 | `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
      display "No PSID in `yyyy'."
      continue
    }
	local yy = substr(string(`yyyy'), 3,2)

    if `yyyy' < 1975 {
      capture gen hlths`yy'   = (hgrade`yy' >=0 & hgrade`yy' <= 3) if hgrade`yy'~=.
      capture gen hhs`yy'     = (hgrade`yy' >=4 & hgrade`yy' <= 8) if hgrade`yy'~=.
      capture gen hcoll`yy'   = (hgrade`yy' >=7 & hgrade`yy' <= 8) if hgrade`yy'~=.
    }
    if `yyyy' >= 1975 & `yyyy' < 1985 {
      capture gen hlths`yy'   = (hgrade`yy' >=0 & hgrade`yy' <= 3) if hgrade`yy'~=.
      capture gen hhs`yy'     = (hgrade`yy' >=4 & hgrade`yy' <= 8) if hgrade`yy'~=.
      capture gen hcoll`yy'   = (hcolldeg`yy' == 1) if hgrade`yy'~=.
    }
    if `yyyy' >= 1985 {
      capture gen hhs`yy'     = (hhsdeg`yy'==1 | hhsdeg`yy'==2) if hhsdeg`yy'~=.
      capture gen hcoll`yy'   = (hcolldeg`yy' == 1) if hhsdeg`yy'~=.
      capture gen hlths`yy'   = (hhsdeg`yy'==3) if hhsdeg`yy'~=.
    }
  }

  save tempdat, replace
  keep hlths* persid
  samp 1
  process_out hlths
  save pro_w_datasets/hlths_w, replace 

  use tempdat
  drop hhsdeg*
  keep hhs* persid
  samp 1
  process_out hhs
  save pro_w_datasets/hhs_w, replace 

  use tempdat
  drop hcolldeg*
  keep hcoll* persid
  samp 1
  process_out hcoll
  save pro_w_datasets/hcoll_w, replace 

* Age variables

  foreach vvv in h w {
    
	use raw_w_datasets/`vvv'age_w

    * Recode DKs/NAs for 2005 and beyond (for earlier years it is done in the readfam programs
	  forvalues yyyy = 2005(2)2009 {
        local yy = substr(string(`yyyy'), 3,2)
	    replace `vvv'age`yy' = . if `vvv'age`yy' == 999 | `vvv'age`yy' == 0
		}
		
    samp 1
	process_out `vvv'age
    save pro_w_datasets/`vvv'age_w, replace
  }

* Now calculate average age of head and wife

  merge_in hage pro

  forvalues yyyy = 1968/2009 {
    if `yyyy' == 1998 | `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
      display "No PSID in `yyyy'."
      continue
    }
  	local yy = substr(string(`yyyy'), 3,2)
    quietly gen hwage`yy' = 1/2*(hage`yy' + wage`yy')
    quietly replace hwage`yy' = hage`yy' if wage`yy' == 0 | wage`yy' == .
  }
  
  keep persid hw*
  samp 1
  process_out hwage
  save pro_w_datasets/hwage_w, replace

* Now calculate age squared, age cubed, age to the fourth

  foreach vvv in h w hw {
    forvalues qqq = 2/4 {
      forvalues yyyy = 1968/2009 {
        if `yyyy' == 1998 | `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
          display "No PSID in `yyyy'."
          continue
        }
  	    local yy = substr(string(`yyyy'), 3,2)
	    use pro_w_datasets/`vvv'age_w, clear
        capture gen `vvv'age`qqq'`yy' = `vvv'age`yy'^`qqq'
		}
		quietly compress
		keep persid `vvv'age`qqq'*
		save pro_w_datasets/`vvv'age`qqq'_w, replace
	}
	}

* Employment status variable
* hestat_1 used as a substitute starting in 1994, when they started to report
* three "mentions" of the variable.

  use raw_w_datasets/hestat_w
  merge_in hestat_1 raw 
  forvalues yyyy = 1994/2009 {
    if `yyyy' == 1998 | `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {/*XXX Added this if statement*/
      display "No PSID in `yyyy'."
      continue
    }     
    local yy = substr(string(`yyyy'), 3,2)
    gen hestat`yy' = hestat_1`yy'
    }
  drop hestat_1*
  replace hestat03 = . if hestat03==22
  replace hestat05 = . if hestat05==0 | hestat05==99
  replace hestat07 = . if hestat07==98 | hestat07==99
  replace hestat09 = . if hestat09==99
  keep persid hestat*
  samp 1 
  process_out hestat
  save pro_w_datasets/hestat_w, replace
  
* Create a dummy for student head.

  use pro_w_datasets/hestat_w

  forvalues yyyy = 1968/2009 {
    local yy = substr(string(`yyyy'), 3,2)
    * Note break in coding
      if `yyyy' < = 1975  {
        capture gen byte studh`yy' = hestat`yy'==5 if hestat`yy' ~= .
      }
      if `yyyy' > 1975 { 
        capture gen byte studh`yy' = hestat`yy'==7 if hestat`yy' ~= .
      } 
  }

  keep studh* persid
  samp 1
  process_out studh
  save pro_w_datasets/studh_w, replace

* Create a dummy for retired head.

  use pro_w_datasets/hestat_w

  forvalues yyyy = 1968/2009 {
    local yy = substr(string(`yyyy'), 3,2)

    * Note break in coding
      if `yyyy' < = 1975  {
        capture gen byte reth`yy' = hestat`yy'==3 if hestat`yy' ~= .
      }
      if `yyyy' > 1975 { 
        capture gen byte reth`yy' = hestat`yy'==4 if hestat`yy' ~= .
      } 
  }

  keep reth* persid
  samp 1
  process_out reth
  save pro_w_datasets/reth_w, replace

* Employment status variable for wife
* westat_1 used as a substitute starting in 1994, when they started to report
* three "mentions" of the variable.

  use raw_w_datasets/westat_w
  merge_in westat_1 raw 
  forvalues yyyy = 1994/2009 {
      if `yyyy' == 1998 | `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
      display "No PSID in `yyyy'."
      continue
      } 
    local yy = substr(string(`yyyy'), 3,2)
    gen westat`yy' = westat_1`yy'
    }
  drop westat_1*
  replace westat01 = . if westat01==9 | westat01==35
  replace westat05 = . if westat05==0 | westat05==99
  replace westat07 = . if westat07==0 | westat07==32 | westat07==99
  replace westat09 = . if westat09==0 | westat09==99
  keep persid westat*
  samp 1 
  process_out westat
  save pro_w_datasets/westat_w, replace
  
* Create a dummy for retired wife.  Note that the wife employment status variable
* doesn't exist for 1977 and 1978.

  use pro_w_datasets/westat_w

  capture gen byte retw76 = westat76==4 if westat76~=. 

  forvalues yyyy = 1979/2009 {
    local yy = substr(string(`yyyy'), 3,2)
    capture gen byte retw`yy' = westat`yy'==4 if westat`yy'~=. 
  }

  keep retw* persid
  samp 1
  process_out retw
  save pro_w_datasets/retw_w, replace

* Create a dummy for "no wife".

  use raw_w_datasets/wifeid_w

  forvalues yyyy = 1968/2009 {
    local yy = substr(string(`yyyy'), 3,2)
    capture gen byte nowife`yy' = wifeid`yy'==.
  }

  keep nowife* persid
  samp 1
  process_out
  save pro_w_datasets/nowife_w, replace

* Create dummy indicating food accuracy problems (a "major" assignment in any of the
* year that the person was a head).

  use raw_w_datasets/acfdrs_w
  merge_in acfdhm raw
  merge_in acfsmn raw
  merge_in fdhm_fs_acc raw
  merge_in fdhm_nfs_acc raw
  merge_in fddel_fs_acc raw
  merge_in fddel_nfs_acc raw
  merge_in fdrs_fs_acc raw
  merge_in fdrs_nfs_acc raw
  merge_in head raw
  
  capture gen tempsum = 0

  * Loop through the years and add 1 to tempsum if there is a food accuracy problem.
  * Use return codes to skip the year if the food variables aren't defined.

    forvalues yyyy = 1974/2009 {
      local yy = substr(string(`yyyy'), 3,2)
      if `yyyy' < 1993 {
        capture replace tempsum = tempsum + (acfdhm`yy'==2)*(head`yy'==1) + (acfdrs`yy'==2)*(head`yy'==1)
        if _rc == 111 {
          display "The food variables are not available in `yyyy'."
          continue
        }
      }
      if `yyyy' == 2001 | `yyyy' == 2003 | `yyyy' == 2005 | `yyyy' == 2007 | `yyyy' == 2009 {
        #delimit ;
        capture replace tempsum = tempsum 
                                + (fdhm_nfs_acc`yy'==1)*(head`yy'==1) + (fdhm_fs_acc`yy'==1)*(head`yy'==1)
                                + (fddel_nfs_acc`yy'==1)*(head`yy'==1) + (fddel_fs_acc`yy'==1)*(head`yy'==1)
                                + (fdrs_nfs_acc`yy'==1)*(head`yy'==1) + (fdrs_fs_acc`yy'==1)*(head`yy'==1)
                                ;
        #delimit cr
      }
      if `yyyy' < 1975 {
        continue
      }
      capture replace tempsum = tempsum + (acfsmn`yy'==2)*(head`yy'==1)
    }

  tab tempsum
  capture gen byte acfd = (tempsum > 0) 

  keep persid acfd
  samp 1
  process_out acfd
  save pro_w_datasets/acfd_w, replace

* Create dummy indicating rent accuracy problems (a "major" assignment during the
* full period).
 
  use raw_w_datasets/acrent_w
  merge_in head raw
  capture gen tempsum = 0

  * Loop through the years and add 1 to tempsum if there is a rent accuracy problem.
  * Use return codes to skip the year if the rent variables isn't defined.

    forvalues yyyy = 1974/2011 {
      if `yyyy'==1998|`yyyy'==2000|`yyyy'==2002|`yyyy'==2004|`yyyy'==2006|`yyyy'==2008|`yyyy'==2010  {
        display "No PSID in `yyyy'"
        continue
      }
      local yy = substr(string(`yyyy'), 3,2)
      capture replace tempsum = tempsum + (acrent`yy'==2)*(head`yy'==1)
      if _rc == 111 {
        display "The rent accuracy variable is not available in `yyyy'."
        continue
      }
      if `yyyy' >= 2001 {
        capture replace tempsum = tempsum + (acrent`yy'==1)*(head`yy'==1)
      }
    }

  tab tempsum
  capture gen byte acrent = (tempsum > 0) 

  keep persid acrent
  samp 1
  process_out acrent
  save pro_w_datasets/acrent, replace

* Create a dummy indicating whether there has been a major assignment in house
* prices in year in which person was head.

  use raw_w_datasets/achval_w
  merge_in head raw
  capture gen tempsum = 0

  * Loop through the years and add 1 to tempsum if there is a housevalue accuracy problem.
  * Use return codes to skip the year if the house value variable isn't defined.

    forvalues yyyy = 1977/2009 {
      if `yyyy' == 1998 | `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
        display "No PSID in `yyyy'"
        continue
      }
      local yy = substr(string(`yyyy'), 3,2)
      capture replace tempsum = tempsum + (achval`yy'==2)*(head`yy'==1)
      if _rc == 111 {
        display "The house value accuracy variable is not available in `yyyy'."
        continue
      }
      if `yyyy' >= 2001 {
        capture replace tempsum = tempsum + (achval`yy'==1)*(head`yy'==1)
      }
    }

  tab tempsum
  capture gen byte achsval = (tempsum > 0) 

  keep persid achsval
  samp 1
  process_out achsval
  save pro_w_datasets/achval, replace

* Create a dummy for female head of household.

  use raw_w_datasets/hsex_w

  forvalues yyyy = 1968/2009 {
      local yy = substr(string(`yyyy'), 3,2)
      capture gen byte fem`yy' = hsex`yy'~=1 if hsex`yy'~=. 
  }

  keep fem* persid
  samp 1
  process_out fem
  save pro_w_datasets/fem_w, replace
  
* Create a dummy for white head of household.  It appears that
* they didn't ask for race in 1994-96 if the head was the same 
* and then began to ask everyone for race again in 1997.  In 
* 1994-96, I filled in race for the same heads, but this creates
* a bit of a break in the sample 1997 (corr in white* is only 85
* percent between 1996 and 1997 but it's much higher prior to that
* ~98 percent and also higher after that ~96 percent).  

  use raw_w_datasets/race_w
  merge_in id raw 

  * First make the 94-96 race variables consistent with the others

    replace race94 = race93 if race94==0 & id94~=.
    replace race95 = race94 if race95==. & id95~=.
    replace race96 = race95 if race96==. & id96~=.
    summ race9*

  forvalues yyyy = 1968/2009 {
    if `yyyy' == 1998 | `yyyy' == 2000 |`yyyy' == 2002 |`yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
      display "No PSID in `yyyy'."
      continue
    }
    local yy = substr(string(`yyyy'), 3,2)
    capture gen byte white`yy' = race`yy'==1 if race`yy'~=. 
  }

  keep white* persid
  samp 1
  process_out white
  save pro_w_datasets/white_w, replace

*============================================================================
* Base nominal income series:  this used to be done in loops since the code
* is somewhat repetitive, but I decided that it was more transparent without
* the loops.
*
* Note that income in the PSID is always reported for the previous year, so
* the data need to be shifted.

* head/wife taxable income

  use raw_w_datasets/hwtxyp_w
  merge_in id raw
  forvalues yyyy = 1994/2009 {
        local yy = substr(string(`yyyy'), 3,2)
        capture replace hwtxyp`yy' = 0 if hwtxyp`yy' == . & id`yy' ~= 0 
		}
  lagmac hwtxyp hwtxy 2009
  keep persid hwtxy*
  samp 1
  process_out hwtxy
  save pro_w_datasets/hwtxy_w, replace
  
* head/wife transfer income

  use raw_w_datasets/hwtryp_w
  merge_in id raw
  forvalues yyyy = 1994/2009 {
        local yy = substr(string(`yyyy'), 3,2)
        capture replace hwtryp`yy' = 0 if hwtryp`yy' == . & id`yy' ~= 0 
		}
  lagmac hwtryp hwtry 2009
  keep persid hwtry*
  samp 1
  process_out hwtry
  save pro_w_datasets/hwtry_w, replace

* Other family member taxable income

  use raw_w_datasets/otxyp_w
  merge_in id raw
  lagmac otxyp otxy 2009
  keep persid otxy*
  samp 1
  process_out otxy
  save pro_w_datasets/otxy_w, replace
  
* Other family member transfer income

  use raw_w_datasets/otryp_w
  merge_in id raw
  lagmac otryp otry 2009
  keep persid otry*
  samp 1
  process_out otry
  save pro_w_datasets/otry_w, replace

* Head labor earnings 

  use raw_w_datasets/htlyp_w
  merge_in id raw
  forvalues yyyy = 1994/2009 {
        local yy = substr(string(`yyyy'), 3,2)
        capture replace htlyp`yy' = 0 if htlyp`yy' == . & id`yy' ~= 0 
		}
  lagmac htlyp htly 2009
  keep htly* persid
  samp 1
  process_out htly
  save pro_w_datasets/htly_w, replace

* Create head labor earnings that are more consistent over time.  
* Add the labor share of business income back in after 93.  This doesn't
* fix everything, as the labor share of farm income is also stripped 
* out after 93 and it's not available to add back in, but we'll deal with that
* by dropping any family with a farm interest.

  use pro_w_datasets/htly_w
  merge_in hbslyp raw
  merge_in id raw
  forvalues yyyy = 1994/2009 {
	local yy = substr(string(`yyyy'), 3,2)
	capture replace hbslyp`yy' = 0 if hbslyp`yy' == . & id`yy' ~= 0 
	}
  lagmac hbslyp hbsly 2009

  forvalues yyyy = 1967/2009 {
	 if `yyyy' == 1997|`yyyy' == 1999|`yyyy' == 2001|`yyyy'==2003|`yyyy'==2005|`yyyy'==2007|`yyyy'==2009 {
	   display "No PSID income in `yyyy'."
	   continue
	 }       
	 local yy = substr(string(`yyyy'), 3,2)
	 if `yyyy' < 1993 {
	   capture gen htlyc`yy' = htly`yy'
	 }
	 if `yyyy' >= 1993 {
	   capture gen htlyc`yy' = htly`yy' + hbsly`yy'
	 }
  }
   
  keep htlyc* persid
  samp 1
  process_out htlyc
  save pro_w_datasets/htlyc_w, replace

* Wife labor earnings

  use raw_w_datasets/wtlyp_w
  merge_in id raw 
  forvalues yyyy = 1994/2009 {
        local yy = substr(string(`yyyy'), 3,2)
        capture replace wtlyp`yy' = 0 if wtlyp`yy' == . & id`yy' ~= 0 
		}
  lagmac wtlyp wtly 2009
  keep wtly* persid
  samp 1
  process_out wtly
  save pro_w_datasets/wtly_w, replace

* Create wife labor earnings that are more consistent over time.
* Add the labor share of business income back in after 93.  This doesn't
* fix everything, as the labor share of farm income is also stripped 
* out after 93 and it's not available to add back in, but we'll deal with that
* by dropping any family with a farm interest.

  use pro_w_datasets/wtly_w
  merge_in id raw
  merge_in wbslyp raw 
      
  forvalues yyyy = 1994/2009 {
	 local yy = substr(string(`yyyy'), 3,2)
	 capture replace wbslyp`yy' = 0 if wbslyp`yy' == . & id`yy' ~= 0 
  }
  lagmac wbslyp wbsly 2009

* Generate the consistent series

  forvalues yyyy = 1967/2009 {
    if `yyyy' == 1997|`yyyy' == 1999|`yyyy' == 2001|`yyyy'==2003|`yyyy'==2005|`yyyy'==2007|`yyyy'==2009 {
	display "No PSID income in `yyyy'."
	continue
	}
  local yy = substr(string(`yyyy'), 3,2)
  if `yyyy' < 1993 {
    capture gen wtlyc`yy' = wtly`yy'
  }
  if `yyyy' >= 1993 {
    capture gen wtlyc`yy' = wtly`yy' + wbsly`yy'
  }
  }

  keep wtlyc* persid
  samp 1
  process_out wtlyc
  save pro_w_datasets/wtlyc_w, replace

* Labor income of head and wife

  use pro_w_datasets/htly_w
  merge_in wtly pro
  merge_in id raw 
  
  forvalues yyyy = 1967/2009 {
 	local yy = substr(string(`yyyy'), 3,2)
    capture egen hwtly`yy' = rowtotal(htly`yy' wtly`yy') if htly`yy'~=. | wtly`yy'~=.
    if _rc == 111 {
    display "The h/w labor income variables are not available in `yyyy'."
    continue
    }
  }
  keep persid hwtly*
  samp 1
  process_out hwtly
  save pro_w_datasets/hwtly_w, replace

* Labor income of head and wife that is more consistent over time

  use pro_w_datasets/htlyc_w
  merge_in wtlyc pro
  merge_in id raw
  
  forvalues yyyy = 1967/2009 {
    local yy = substr(string(`yyyy'), 3,2)
    capture egen hwtlyc`yy' = rowtotal(htlyc`yy' wtlyc`yy') if htlyc`yy'~=. | wtlyc`yy'~=.
    if _rc == 111 {
    display "The h/w labor income variables are not available in `yyyy'."
    continue
    }
  }
  keep persid hwtlyc*
  samp 1
  process_out hwtlyc
  save pro_w_datasets/hwtlyc_w, replace

* Create total family nominal taxable income.
 
  use pro_w_datasets/hwtxy_w
  merge_in otxy pro
  merge_in id raw
 
  forvalues yyyy = 1967/2009 {
    local yy = substr(string(`yyyy'), 3,2)
    capture egen txy`yy' = rowtotal(hwtxy`yy' otxy`yy') if hwtxy`yy'~=. | otxy`yy'~=. 
    if _rc == 111 {
    display "The tx income variables are not available in `yyyy'."
    continue
    }
  }
  keep persid txy*
  samp 1
  process_out txy
  save pro_w_datasets/txy_w, replace

* Total family nominal transfer income.
* Need to explicitly add in social security after 1993.
 
  use pro_w_datasets/hwtry_w
  merge_in otry pro
  merge_in ssec raw
  merge_in hssec raw
  merge_in wssec raw
  merge_in ossec raw
  
  forvalues yyyy = 1967/1992 {
    local yy = substr(string(`yyyy'), 3,2)
    capture egen try`yy' = rowtotal(hwtry`yy' otry`yy') if hwtry`yy'~=. | otry`yy'~=.
    if _rc == 111 {
      display "The tr income variables are not available in `yyyy'."
    continue
    }
  }

  forvalues yyyy = 1993/2003 {
    local yy = substr(string(`yyyy'), 3,2)
    local ldyyyy = `yyyy' + 1          /* ssec var still from wave yy + 1 */
    local ldyy = substr(string(`ldyyyy'),3,2)
    capture egen try`yy' = rowtotal(hwtry`yy' otry`yy' ssec`ldyy') if hwtry`yy'~=. | ///
       otry`yy'~=. | ssec`ldyy' ~=.
    if _rc == 111 {
      display "The tr income variables are not available in `yyyy'."
    continue
    }
  }
  
  forvalues yyyy = 2004/2009 {
    local yy = substr(string(`yyyy'),3,2)
    local ldyyyy = `yyyy' + 1          /* ssec var still from wave yy + 1 */
    local ldyy = substr(string(`ldyyyy'),3,2)
    capture egen try`yy' = rowtotal(hwtry`yy' otry`yy' hssec`ldyy' wssec`ldyy' ossec`ldyy') if hwtry`yy'~=. | ///
       otry`yy'~=. | hssec`ldyy' ~=. | wssec`ldyy' ~=. | ossec`ldyy'~=.
    if _rc == 111 {
      display "The tr income variables are not available in `yyyy'."
    continue
    }
  }

  keep persid try*
  samp 1
  process_out try
  save pro_w_datasets/try_w, replace

* Family "money" income.  
* Prior to 1993, this variable was bottomcoded at 1.  Values of zero and negative
* amounts were allowed thereafter.  To make things consistent, we re-set such values
* to 1 for post-92 income values.

    use raw_w_datasets/myp_w
	merge_in id raw
	forvalues yyyy = 1994/2009 {
	 local yy = substr(string(`yyyy'), 3,2)
	 capture replace myp`yy' = 0 if myp`yy' == . & id`yy' ~= 0 
	 capture replace myp`yy' = 1 if myp`yy' <= 0 
    }
    
   lagmac myp my 2009
   keep my* persid
   samp 1
   process_out my
   save pro_w_datasets/my_w, replace

* Create h/w "capital" income  = h/w taxable - h/w labor.

  use pro_w_datasets/hwtxy_w
  merge_in hwtlyc pro
  addcpi
  forvalues yyyy = 1967/2009 {
    local yy = substr(string(`yyyy'), 3,2)
    capture gen hwcy`yy' = ( hwtxy`yy' - hwtlyc`yy')
    if _rc == 111 {
      display "The capital income variables are not available in `yyyy'."
      continue
    }
  }

  keep hwcy* persid
  samp 1
  process_out hwcy
  save pro_w_datasets/hwcy_w, replace


* Loop that deals with the fact that the topcoding in the PSID changes over the time.
* In general, you don't want to include topcoded households in the analysis because
* they distort the variance calculations.  But, we can't simply drop topcoded households
* because it would be inconsistent over the years because lower topcode threshholds
* imply that many more households should be topcoded.  After much thought, we decided to
* drop the same fraction of top income households in each year.  This means figuring out 
* which year dropped the most households and using the fraction dropped in that year
* in every year.
*
* Practically speaking, we don't actually drop any households here; we just create 
* variables that indicate who should be dropped.
*
* Note too that the fraction is depending on the sample used.
*
* For each variable:
*   1. We do two loops --- one corresponding to the full sample (#1) and one corresponding
*      to just the cross section (#2).
*   2. We loop through the years to find the year with the largest fraction topcoded.  The 
*      value at which things are topcoded gets raised over the years (at different times and
*      to different values for different variables).
*   3. We create variables which are set equal to 1 if the observation should be dropped. 
*

* this loop creates scalars corresponding to the value at which each variable is topcoded
* first, we create scalars equal to the dates at which the value "steps up" for each variable (it's
* different for each variable; what a pain)

  scalar end1hwtxy = 1978
  scalar end2hwtxy = 1980
  scalar end1my    = 1978
  scalar end2my    = 1980
  scalar end1htly  = 1981
  scalar end2htly  = 1991
  scalar end1htlyc = 1981
  scalar end2htlyc = 1991
  scalar end1wtly  = 1982
  scalar end2wtly  = 1992
  scalar end1wtlyc = 1982
  scalar end2wtlyc = 1992
  scalar end1otxy  = 1982
  scalar end2otxy  = 1998
  scalar end1hwtry = 1991  /* we don't end up using these but I put them in for completeness */
  scalar end2hwtry = 1998
  scalar end1otry  = 1992
  scalar end2otry  = 2008  /* add 2 to this one when adding a wave --- see the note below */

  foreach vvv in hwtxy my htly htlyc wtly wtlyc otxy {
        local end1 = end1`vvv'
        local end2 = end2`vvv'
	  forvalues yyyy = 1967/`end1' {
	     local yy = substr(string(`yyyy'), 3,2)
	     scalar tv`vvv'`yy' = 99999
	     local start2 = `yyyy' + 1	     
	  }
	  forvalues yyyy = `start2'/`end2' {
	     local yy = substr(string(`yyyy'), 3,2)
	     scalar tv`vvv'`yy' = 999999
	     local start3 = `yyyy' + 1
	  }
	  forvalues yyyy = `start3'/2009 {
	     local yy = substr(string(`yyyy'), 3,2)
	     scalar tv`vvv'`yy' = 9999999
	  }
  }

  * other transfer income never makes that second step so we reset here /* see note above */
    scalar tvotry09 = 999999


* Note special cases:  "try" topcoded at 0.005 as recommended in the income plus
* background note.  We do the same thing for "hwtry" and "otry"
  
 foreach vvv in htly wtly hwtxy otxy my htlyc wtlyc try hwtry otry {
    forvalues sss = 1/2 {
        drop _all
        use pro_w_datasets/`vvv'_w
        keep persid
        save pro_w_datasets/t`sss'`vvv'_w, replace

       * This part of the loop calculates the maximum topcoded fraction
	   
         use pro_w_datasets/`vvv'_w
         merge_in wgt pro
         merge_in id raw
         capture lagmac wgt adjwgt 2009
         *summ adjwgt*
         capture lagmac id adjid 2009
         capture lagmac dupl adjdupl 2009
         if `sss' == 2 {
           merge_in cross pro
           quietly drop if cross~=1
           drop cross
         }
         save tempdat, replace
 
       * Starting values

         scalar maxfrac = 0
         scalar maxyr = 1998
 
         * This loop goes through the years calculates the maximum fraction topcoded for each variable 
 
           forvalues yyyy = 1967/2009 {
		
             if "`vvv'" == "try" | "`vvv'" == "hwtry" | "`vvv'" == "otry"  {
                scalar maxfrac = .005
                scalar maxyr = 9999999
                continue, break
                }

     		 local yy = substr(string(`yyyy'), 3,2)
             drop _all
             use tempdat
             capture drop if adjdupl`yy' == 1
             capture drop if adjid`yy' == 0
             capture drop if `vvv'`yy' == 0
             if `sss' == 2 {
               capture replace adjwgt`yy' = 1 if adjwgt`yy' ~= . 
               }
             capture summ `vvv'`yy' [w=adjwgt`yy'], detail
             if _rc == 111 {
                * display "The `vvv' variable is not available in `yyyy'"
                continue
                }
             scalar sumwt = r(sum_w)
             capture summ `vvv'`yy' [w=adjwgt`yy'] if `vvv'`yy'==tv`vvv'`yy', detail
             scalar sumtop = r(sum_w)
            
             if maxfrac < sumtop/sumwt {
                scalar maxyr = `yyyy'
             }
             scalar maxfrac = max(maxfrac, sumtop/sumwt)  
          }

          display "  "
          display "  "
          display "*****"
          if `sss' == 1 {
            display "In full sample, max fraction `vvv' topcoded was " maxfrac " for " maxyr " income."
          }
          if `sss' == 2 {
            display "In SRC cross section, max fraction `vvv' topcoded was " maxfrac " for " maxyr " income."
          }
          display "*****"
          display "  "
          display "  "

        * Now create the actual flag

          forvalues yyyy = 1967/2009 {
            local yy = substr(string(`yyyy'), 3,2)
            use tempdat, clear
            capture drop if adjdupl`yy' == 1
            capture drop if adjid`yy'   == 0
            capture drop if `vvv'`yy'   == .
            capture drop if `vvv'`yy'   == 0
            if `sss' == 2 {
              capture replace adjwgt`yy' = 1 if adjwgt`yy'~=. 
            }
            capture sort `vvv'`yy'
             if _rc == 111 {
              display "The `vvv' variable is not available in `yyyy'"
              continue
             }

            gen rsumwt = sum(adjwgt`yy')
            gen t`sss'`vvv'`yy' = ((rsumwt/rsumwt[_N]) > (1-maxfrac))
            drop rsumwt

            summ t`sss'`vvv'`yy' [w=adjwgt`yy'], meanonly
		*disp "Weighted mean of t`sss'`vvv'`yy':" r(mean)
            keep persid t`sss'`vvv'`yy'
            sort persid
            merge_in t`sss'`vvv' pro 
            capture save pro_w_datasets/t`sss'`vvv'_w, replace
          }
 }
 }


* Topcoding for hw labor income is a function of the topcoding for the components.

  foreach vvv in tly tlyc {
    if "`vvv'" == "tlyc" {
       local ccc `"c"'
    }

    forvalues sss = 1/2 {

      use pro_w_datasets/t`sss'htly`ccc'_w
      merge_in t`sss'wtly`ccc' pro

      forvalues yyyy = 1967/2009 {
          local yy = substr(string(`yyyy'), 3,2)
          capture count if t`sss'htly`ccc'`yy' == 0
          if _rc == 111 {
            continue
          }
        quietly gen t`sss'hwtly`ccc'`yy' = 0
        quietly replace t`sss'hwtly`ccc'`yy' = . if t`sss'htly`ccc'`yy'==. & t`sss'wtly`ccc'`yy'==.
        quietly replace t`sss'hwtly`ccc'`yy' = 1 if t`sss'htly`ccc'`yy'== 1 | t`sss'wtly`ccc'`yy'== 1
        }
    sort persid        
    keep persid t`sss'hwtly`ccc'*
    save pro_w_datasets/t`sss'hwtly`ccc'_w, replace
  }
}

* Topcoding for total taxable income is a function of the topcoding for the components.

  forvalues sss = 1/2 {

    use pro_w_datasets/t`sss'hwtxy_w
    merge_in t`sss'otxy pro

    forvalues yyyy = 1967/2009 {
       local yy = substr(string(`yyyy'), 3,2)
       capture count if t`sss'hwtxy`yy' == 0
       if _rc == 111 {
          continue
        }
        quietly gen t`sss'txy`yy' = 0
        quietly replace t`sss'txy`yy' = . if t`sss'hwtxy`yy'==. & t`sss'otxy`yy'==.
        quietly replace t`sss'txy`yy' = 1 if t`sss'hwtxy`yy'== 1 | t`sss'otxy`yy'== 1
        }
    sort persid        
    keep persid t`sss'txy*
    save pro_w_datasets/t`sss'txy_w, replace
  }

* Set topcoding for hwcy as a function of the topcoding for the component
* variables

   forvalues sss = 1/2 {

    use pro_w_datasets/t`sss'hwtxy_w
    merge_in t`sss'hwtly pro

    forvalues yyyy = 1967/2009 {
        local yy = substr(string(`yyyy'), 3,2)
        capture count if t`sss'hwtxy`yy' == 0
        if _rc == 111 {
          continue
        }
        quietly gen t`sss'hwcy`yy' = 0
        quietly replace t`sss'hwcy`yy' = . if t`sss'hwtxy`yy'==. & t`sss'hwtly`yy'==.
        quietly replace t`sss'hwcy`yy' = 1 if t`sss'hwtxy`yy'== 1 | t`sss'hwtly`yy'== 1
        *tab t`sss'hwcy`yy'
    }
    sort persid        
    keep persid t`sss'hwcy*
    save pro_w_datasets/t`sss'hwcy_w, replace
  }

* The topcoding does not generally bind for the hours variables.  But we
* identify them as topcoded if the corresponding earnings variable is 
* topcoded.  This ensures that the decompositions we do of hours and earnings
* variances use a consistent sample.  

   forvalues sss = 1/2 {

    use pro_w_datasets/t`sss'htlyc_w
    merge_in t`sss'wtlyc pro
    merge_in t`sss'hwtlyc pro

    forvalues yyyy = 1967/2009 {

        local yy = substr(string(`yyyy'), 3,2)
        capture count if t`sss'htlyc`yy' == 0
        if _rc == 111 {
          continue
        }
        capture count if t`sss'wtlyc`yy' == 0
        if _rc == 111 {
          continue
        }
        capture count if t`sss'hwtlyc`yy' == 0
        if _rc == 111 {
          continue
        }
        quietly gen t`sss'hhry`yy' = t`sss'htlyc`yy'
        *tab t`sss'hhry`yy'

        quietly gen t`sss'whry`yy' = t`sss'wtlyc`yy'
        *tab t`sss'whry`yy'

        quietly gen t`sss'hwhrs`yy' = t`sss'hwtlyc`yy'
        *tab t`sss'hwhrs`yy'

        foreach nnn in h w hw {
          quietly gen t`sss'`nnn'eph`yy' = t`sss'`nnn'tlyc`yy'
          *tab t`sss'`nnn'eph`yy'
        }

    }
    save tempdat, replace
    sort persid        
    keep persid t`sss'hhry*
    save pro_w_datasets/t`sss'hhry_w, replace

    use tempdat
    sort persid        
    keep persid t`sss'whry*
    save pro_w_datasets/t`sss'whry_w, replace

    use tempdat
    sort persid        
    keep persid t`sss'hwhrs*
    save pro_w_datasets/t`sss'hwhrs_w, replace

    foreach nnn in h w hw {
    use tempdat
    sort persid        
    keep persid t`sss'`nnn'eph*
    save pro_w_datasets/t`sss'`nnn'eph_w, replace
 
  }

}


*============================================================================
* Real income and log real income series.
*
* Create real values of taxable and transfer income

  local vlist "hwtxy hwtry hwcy otxy otry txy try my htly wtly hwtly htlyc wtlyc hwtlyc"
  
  foreach vvv in `vlist' {
    use pro_w_datasets/`vvv'_w
    addcpi
    forvalues yyyy = 1967/2009 {
      local yy = substr(string(`yyyy'), 3,2)
      capture gen r`vvv'`yy' = `vvv'`yy' / cpi`yy'
      if _rc == 111 {
        display "The `vvv' variable is not available in `yyyy'."
      }
    capture gen lr`vvv'`yy' = log(r`vvv'`yy')
    }
    save tempdat, replace
    keep persid r`vvv'*
    samp 1
	process_out r`vvv'
    save pro_w_datasets/r`vvv'_w, replace
    use tempdat
    keep persid lr`vvv'*
    samp 1
	log close
    quietly log using logfiles/process_output_$mydate, append
    summ lr`vvv'*
	log close
    save pro_w_datasets/lr`vvv'_w, replace
  }

*============================================================================
* Hours variables.
* Create head and wife hours worked that span the panel length.

    * Get the data

      foreach vvv in hhry whry {
	      use raw_w_datasets/`vvv'p_w
	      merge_in id raw
	      forvalues yyyy = 1994/2009 {
	        local yy = substr(string(`yyyy'), 3,2)
	        capture replace `vvv'p`yy' = 0 if `vvv'p`yy' == . & id`yy' ~= 0 
			}
	      lagmac `vvv'p `vvv' 2009
	      keep `vvv'* persid
		  process_out `vvv'
	      samp 1
	      save pro_w_datasets/`vvv'_w, replace
      }

* Create hours variable:  lag and sum.  The old version had everyone in the 
* family.  This version has just head and wife because others' hours are 
* not available for the early release.

  use pro_w_datasets/hhry_w
  merge_in whry pro

  forvalues yyyy = 1967/2009 {
    local yy = substr(string(`yyyy'), 3,2)
    capture gen hwhrs`yy' = (hhry`yy' + whry`yy')  if hhry`yy' ~= .
    if _rc == 111 {
      display "The hours variables are not available in `yyyy'."
      continue
    }
  }

  keep hwhrs* persid
  samp 1
  process_out hwhrs
  save pro_w_datasets/hwhrs_w, replace

* Create earnings per hour

  foreach nnn in h w {
    use pro_w_datasets/r`nnn'tlyc_w
    merge_in `nnn'hry pro

    forvalues yyyy = 1967/2009 {
      local yy = substr(string(`yyyy'), 3,2)
      capture gen r`nnn'eph`yy' = r`nnn'tlyc`yy' / `nnn'hry`yy' 
      if _rc == 111 {
      display "The `nnn' hours variables are not available in `yyyy'."
      continue
    }
    quietly replace r`nnn'eph`yy' = 0 if `nnn'hry`yy' == 0
  }

  keep r`nnn'eph* persid
  samp 1
  sort persid
  process_out r`nnn'eph
  save pro_w_datasets/r`nnn'eph_w, replace

  }

  use pro_w_datasets/rhwtlyc_w
  merge_in hwhrs pro

  forvalues yyyy = 1967/2009 {
    local yy = substr(string(`yyyy'),3,2)
    capture gen rhweph`yy' = rhwtlyc`yy' / hwhrs`yy' 
    if _rc == 111 {
      display "The hw hours variables are not available in `yyyy'."
      continue
    }
    quietly replace rhweph`yy' = 0 if hwhrs`yy' == 0
  }

  keep rhweph* persid
  samp 1
  sort persid
  process_out
  save pro_w_datasets/rhweph_w, replace

* Create a variable corresponding to number of earners among the head and wife. The wife employment status 
* variable doesn't go back to the beginning, so a wife is counted in the labor force based on
* her hours.

  use pro_w_datasets/hestat_w
  merge_in whry pro

  * We need to lag the head employment status variable so that it aligns with 
  * the timing of the previous year's income.  We are assuming that whether you
  * are in the labor force in t+1 is a good proxy for being in the labor force in
  * t.

  lagmac hestat tmp_hestat 2009

  forvalues yyyy = 1967/2009 {
    if `yyyy'==1997 | `yyyy'==1999 | `yyyy'==2001 | `yyyy' == 2003 | `yyyy' == 2005 | `yyyy' == 2007 | `yyyy' == 2009 {
      display "No hours variables for `yyyy'"
      continue
    }
    if `yyyy' > 1975 {
      quietly recode tmp_hestat`yy' (2=1) (3=2)
    }
   local yy = substr(string(`yyyy'),3,2)
    gen earners`yy' = 0
    capture replace earners`yy' = 1 if (tmp_hestat`yy'==1 | tmp_hestat`yy'==2)
    capture replace earners`yy' = 1 if (whry`yy' >= 100 & whry`yy'~=.)
    capture replace earners`yy' = 2 if (tmp_hestat`yy'==1 | tmp_hestat`yy'==2) & (whry`yy' >= 100 & whry`yy'~=.)
  }

  keep earners* persid
  samp 1
  process_out earners
  save pro_w_datasets/earners_w, replace

* Create hours of work missed because of illness.

  foreach x in s o {

    foreach vvv in hill will {

      use raw_w_datasets/`vvv'`x'p_w
      merge_in id raw

      forvalues yyyy = 1994/2009 {
	        local yy = substr(string(`yyyy'), 3,2)
	        capture replace `vvv'`x'p`yy' = 0 if `vvv'`x'p`yy' == . & id`yy' ~= 0 
              capture replace `vvv'`x'p`yy' = 40*`vvv'`x'p`yy' 
			}
   
    * Unlag the series

      lagmac `vvv'`x'p `vvv'`x' 2009
   
    * Check the data and save 

      keep persid `vvv'`x'*
      samp 1
	  process_out `vvv'`x'
      save pro_w_datasets/`vvv'`x'_w, replace
  }
  }

* Family size.  Just copy it from the raw directory.

  copy raw_w_datasets/famsz_w.dta pro_w_datasets/famsz_w.dta, replace

* Interview month
*
*      Before 1980, 'intdat' was bracketed as follows
*
*                1 = 3/1  to 3/14
*                2 = 3/15 to 3/28
*                3 = 3/29 to 4/18
*                4 = 4/19 to 5/2
*                5 = 5/3  to 5/16
*                6 = 5/17 to 5/30
*                7 = 5/31 to 6/30
*                8 = after 7/1
*                9 = DK

  use raw_w_datasets/intdat_w
  merge_in intdat_m raw

  forvalues yyyy = 1968/2009 {
    if `yyyy' == 1998 | `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
      display "No PSID in `yyyy'"
      continue
    }
    local yy = substr(string(`yyyy'), 3,2)
    capture gen intmo`yy' = .
    if `yyyy' < 1980 {
      capture replace intmo`yy' = 3 if ( intdat`yy'==1 | intdat`yy'==2 )
      capture replace intmo`yy' = 4 if ( intdat`yy'==3 | intdat`yy'==4 )
      capture replace intmo`yy' = 5 if ( intdat`yy'==5 | intdat`yy'==6 )
      capture replace intmo`yy' = 6 if ( intdat`yy'==7 )
    }   
    if `yyyy' >= 1980 & `yyyy' <= 1996 {
      capture replace intmo`yy' = int(intdat`yy'/100)
    }
    if `yyyy' > 1996 {
      capture replace intmo`yy' = intdat_m`yy'
    }
  }
   
  drop intdat_m*
  keep intmo* persid
  samp 1
  process_out intmo
  save pro_w_datasets/intmo_w, replace

* Create food variables.

  use pro_w_datasets/intmo_w
  
  foreach vvv in fdhm fsyrp fsmn stincl fdrs fdhm_fs_amt fdhm_fs_per fdhm_nfs_amt fdhm_nfs_per ///
          fddel_fs_amt fddel_fs_per fddel_nfs_amt fddel_nfs_per fdrs_fs_amt fdrs_fs_per ///
		  fdrs_nfs_amt fdrs_nfs_per fsmn_amt fsmn_per {
    merge_in `vvv' raw
  }
     
  * annualized amounts
  * Note that you need to recode the 1994 "per" variables (only this wave is different).
  * Also set observations with high values (more than $1000/week) to zero.  Some of this is taking out
  * extreme outliers and some of this is taking out the DKs, NAs (which have values like 99999).
  
    foreach vvv in fsmn fdhm_fs fdhm_nfs fddel_fs fddel_nfs fdrs_fs fdrs_nfs {
	  quietly recode `vvv'_per94  (1=3) (2=4) (3=5) (4=7)
      annamt `vvv' 2009
	  forvalues yyyy = 1994/2009 {
          local yy = substr(string(`yyyy'), 3,2)
          capture replace `vvv'_aa`yy' = . if (`vvv'_aa`yy'/52) > 1000 
	  }
	  drop `vvv'_amt* `vvv'_per*
	  }

  * Price variables
  
    lagmac fsyrp fsyr 2009
    capture program drop addpfdhm
    run addpfdhm
    addpfdhm
    capture program drop addpfdrs
    run addpfdrs
    addpfdrs
    
  *-------------------------------
  * Food at home (includes money spent on food stamps)
  *
  * Calculations for 1970-93
 
    gen rfdhm70 = (fdhm70 + fsyr70) / pfdhm70
    gen rfdhm71 = (fdhm71 + fsyr71) / pfdhm71
    gen rfdhm74 = (fdhm74 + fsyr74*(stincl74==1) + fsyr74*(stincl74==9)) / pfdhm74
    gen rfdhm75 = (fdhm75 + 12*fsmn75*(stincl75==1) + 12*fsmn75*(stincl75==9)) / pfdhm75
    gen rfdhm76 = (fdhm76 + 12*fsmn76*(stincl76==1) + 12*fsmn76*(stincl76==9)) / pfdhm76

    forvalues yyyy = 1977/1993 {
      local yy = substr(string(`yyyy'), 3,2)
      capture gen rfdhm`yy' = (fdhm`yy' + 12*fsmn`yy')/pfdhm`yy'
      if _rc == 111 {
        display "The food at home variable could not be created for `yyyy'"
      }
    }

    forvalues yyyy = 1994/2009 {
	  if `yyyy' == 1998 | `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
        *display "No PSID in `yyyy'"
        continue
      }
      local yy = substr(string(`yyyy'), 3,2)
	  quietly gen rfdhm`yy' = (fdhm_fs_aa`yy' + fdhm_nfs_aa`yy' + fsmn_aa`yy') / pfdhm`yy'
	}  
	  
    drop fdhm* fsmn* fsyr* stincl* fs* intmo*

  * Create nominal food at home

    forvalues yyyy = 1968/2009 {
      if `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
        *display "No PSID in `yyyy'"
        continue
      }
	  local yy = substr(string(`yyyy'), 3,2)
      capture gen fdhm`yy' = pfdhm`yy' * rfdhm`yy'
    }

    drop pfdhm*

  *-------------------------------------------
  * Away from home (including delivered):  
  
    forvalues yyyy = 1970/1993 {
      local yy = substr(string(`yyyy'), 3,2)
      capture gen rfdrs`yy' = fdrs`yy'/pfdrs`yy'
      if _rc == 111 {
        display "The food away from home variable could not be created for `yyyy'"
      }  
    }

    forvalues yyyy = 1994/2009 {
      if `yyyy' == 1998 | `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
        *display "No PSID in `yyyy'"
        continue
      }
	  local yy = substr(string(`yyyy'), 3,2)
      capture gen rfdrs`yy' = (fddel_fs_aa`yy' + fddel_nfs_aa`yy' + fdrs_fs_aa`yy' + fdrs_nfs_aa`yy')/pfdrs`yy'
      if _rc == 111 {
        display "The food away from home variable could not be created for `yyyy'"
      }   
    }
    
	drop fddel* fdrs*     

*-------------------------------------------
* Total food 

    forvalues yyyy = 1968/2009 {
      if `yyyy' == 2000 | `yyyy' == 2002 | `yyyy' == 2004 | `yyyy' == 2006 | `yyyy' == 2008 {
        *display "No PSID in `yyyy'"
        continue
      }
	  local yy = substr(string(`yyyy'), 3,2)
     
      capture gen fdrs`yy' = pfdrs`yy' * rfdrs`yy'
	  capture gen rfd`yy'  = rfdrs`yy' + rfdhm`yy'
      capture gen fd`yy'   = fdhm`yy' + fdrs`yy'
    }
    
    drop pfdrs*

  samp 1
  compress
  save tempdat, replace
  
  keep persid rfd7* rfd8* rfd9* rfd0*
  process_out rfd*
  save pro_w_datasets/rfd_w, replace

  use tempdat
  keep persid rfdhm7* rfdhm8* rfdhm9* rfdhm0*
  process_out rfdhm
  save pro_w_datasets/rfdhm_w, replace

  use tempdat
  keep persid rfdrs7* rfdrs8* rfdrs9* rfdrs0*
  process_out rfdrs
  save pro_w_datasets/rfdrs_w, replace

  use tempdat
  keep persid fd7* fd8* fd9* fd0*
  save pro_w_datasets/fd_w, replace

  use tempdat
  keep persid fdhm7* fdhm8* fdhm9* fdhm0*
  process_out fdhm
  save pro_w_datasets/fdhm_w, replace

  use tempdat
  keep persid fdrs7* fdrs8* fdrs9* fdrs0*
  process_out fdrs
  save pro_w_datasets/fdrs_w, replace

* House value

  use raw_w_datasets/hsval_w
  merge_in intmo pro
  merge_in id raw

  capture program drop addpshel
  run addpshel
  addpshel

  forvalues yyyy = 1970/2011 {
    local yy = substr(string(`yyyy'), 3,2) 
    capture replace hsval`yy' = 0 if hsval`yy' == . & id`yy'~=0
    capture replace hsval`yy' = . if hsval`yy' > 9999997
    capture gen rhsval`yy' = hsval`yy' / pshel`yy'
    if _rc==111 {
      display "The real house value variable could not be created for `yyyy'"
    }
  }
 
  samp 1
  quietly compress
  
  save tempdat, replace
  keep hsval* persid
  process_out hsval
  save pro_w_datasets/hsval_w, replace
  
  use tempdat
  keep rhsval* persid
  process_out rhsval
  save pro_w_datasets/rhsval_w, replace

* Create rent variable

  use raw_w_datasets/rent_w
  merge_in intmo pro
  merge_in rent_amt raw
  merge_in rent_per raw

  capture program drop addpshel
  run addpshel
  addpshel

  forvalues yyyy = 1970/1993 {   
    local yy = substr(string(`yyyy'), 3,2) 
    capture gen rrent`yy' = rent`yy' / pshel`yy'
    if _rc==111 {
      display "The rent variable could not be created for `yyyy'"
    }
  }

  * get rid of DKs, NAs
  
  replace rent_amt05 = . if rent_amt05 > 99995
  replace rent_amt07 = . if rent_amt07 > 99995
  replace rent_amt09 = . if rent_amt09 > 99995
  replace rent_amt11 = . if rent_amt11 > 99995
  
  * calculate annual amounts (note that the 94, 95 "per" variables have to be recoded to work with annamt
  recode rent_per94 (1=5) (2=4) (4=7)
  recode rent_per95 (1=5) (2=4) (4=7)
  annamt rent 2011
  save tempdat, replace

  forvalues yyyy = 1994/2011 {
    local yy = substr(string(`yyyy'), 3,2) 
	capture gen rrent`yy' = rent_aa`yy' / pshel`yy'
    if _rc==111 {
      display "The real rent variable could not be created for `yyyy'"
    }
  }

  keep rrent* persid
  samp 1
  compress
  process_out rrent
  save pro_w_datasets/rrent_w, replace

  use tempdat
  keep rent_aa* persid
  compress
  process_out rent_aa
  save pro_w_datasets/rent_aa_w, replace


* USDA food standard variable.  Only goes through 2007.  Might want to think about 
* using the Census measure, but it only starts in 1990.    

      use raw_w_datasets/fstd_w
      merge_in id raw

	  forvalues yyyy = 1994/2007 {
        local yy = substr(string(`yyyy'), 3,2) 
		capture replace fstd`yy' = 0 if fstd`yy' == . & id`yy'~=0
		}
   
    * Check the data and save 

      keep persid fstd*
      samp 1
	  process_out fstd
      save pro_w_datasets/fstd_w, replace

* Real food standards.

  * Get the data
    
    use pro_w_datasets/fstd_w
    merge_in intmo pro

    capture program drop addpfdhm
    run addpfdhm
    addpfdhm

  * Calculate the reals

    forvalues yyyy = 1967/2009 {

      if `yyyy'==1998 | `yyyy'==2000 | `yyyy'==2002 | `yyyy'== 2004 | `yyyy'== 2006 {
        display "No PSID in `yyyy'"
        continue
      }
	  local yy = substr(string(`yyyy'), 3,2) 
      capture gen rfstd`yy' = fstd`yy' / pfdhm`yy' 
      if _rc == 111 {
        display "Real food standards cannot be calculated in `yyyy'."
      }
    }
 
  * Check the data and save 

    keep persid rfstd*
    samp 1
	process_out rfstd
    save pro_w_datasets/rfstd_w, replace

* more consumption data

    foreach vvv in tuit osch hrep furn cloth trip orec add_lse car_ins {
	  scalar hv`vvv' = 999995
	}
	foreach vvv in ptax car_rep gas park bus cab otrans{
	  scalar hv`vvv' = 99995
	}
	foreach vvv in hins {
	 scalar hv`vvv' = 9995
	} 
     
    foreach vvv in tuit osch hrep furn cloth trip orec ptax hins car_rep gas park bus cab otrans add_lse car_ins {
      use raw_w_datasets/`vvv'_amt_w
	  capture merge_in `vvv'_per raw
      forvalues yyyy = 1999(2)2009 {
        local yy = substr(string(`yyyy'), 3,2) 
        capture replace `vvv'_amt`yy' = . if `vvv'_amt`yy' > hv`vvv'
	    if inlist("`vvv'", "tuit","osch","ptax","hins") {
	  		quietly gen `vvv'_per`yy' = 6  /* question asked for previous year */
			}
		if inlist("`vvv'","car_rep","gas","park","bus","cab","otrans","add_lse") {
	  		quietly gen `vvv'_per`yy' = 5  /* question asked for previous month */
			}
  	}
    annamt `vvv' 2009
    keep persid `vvv'_aa* 
    samp 1
    process_out `vvv'_aa
    save pro_w_datasets/`vvv'_aa_w, replace
	}

* vehicle ownership

  forvalues nnn = 1/3 {
    use raw_w_datasets/veh_acq_`nnn'_w
    merge_in id raw
    forvalues yyyy = 1999(2)2003 {
      local yy = substr(string(`yyyy'), 3,2)
      quietly gen veh_yes_`nnn'`yy' = 0 if id`yy'~=0
      quietly replace veh_yes_`nnn'`yy' = 1 if veh_acq_`nnn'`yy' ~=. & id`yy'~=0
    }
    forvalues yyyy = 2005(2)2009 {
      local yy = substr(string(`yyyy'), 3,2)
	  quietly gen veh_yes_`nnn'`yy' = 0 if id`yy'~=0
	  quietly replace veh_yes_`nnn'`yy' = 1 if veh_acq_`nnn'`yy' ~=0 & id`yy'~=0
    }
    keep persid veh_yes_`nnn'*
    process_out veh_yes_`nnn'
    save pro_w_datasets/veh_yes_`nnn'_w, replace
  } 	
	
* vehicle financing info:  all we need to do here is to 
* reset mv's to 0 and NAs, DKs, to .

  foreach vvv in prc dwn bora pymt {
	  scalar hv`vvv' = 999995
	}
  foreach vvv in trm pymtm {
	  scalar hv`vvv' = 995
	}

  foreach vvv in prc dwn bora pymt trm pymtm {
    forvalues nnn = 1/3 {
      use raw_w_datasets/veh_`vvv'_`nnn'_w
	  merge_in id raw
	  forvalues yyyy = 1999(2)2009 {
        local yy = substr(string(`yyyy'), 3,2)
		capture replace veh_`vvv'_`nnn'`yy' = 0 if  veh_`vvv'_`nnn'`yy' == . & id`yy'~=0
		capture replace veh_`vvv'_`nnn'`yy' = . if veh_`vvv'_`nnn'`yy' > hv`vvv'
  	  }
    keep persid veh_`vvv'_`nnn'* 
    samp 1
    process_out veh_`vvv'_`nnn'
    save pro_w_datasets/veh_`vvv'_`nnn'_w, replace
	}
	}

* expenditures on leasing and loan payments --- can't do this in the consumption
* loops above because I didn't use a parallel naming structure.

  forvalues nnn = 1/3 {
    use raw_w_datasets/veh_lse_amt_`nnn'_w
    merge_in veh_lse_per_`nnn' raw
    forvalues yyyy = 1999(2)2009 {
        local yy = substr(string(`yyyy'), 3,2)
        rename veh_lse_amt_`nnn'`yy' veh_lse_`nnn'_amt`yy'
        rename veh_lse_per_`nnn'`yy' veh_lse_`nnn'_per`yy'
        capture replace veh_lse_`nnn'_amt`yy' = . if veh_lse_`nnn'_amt`yy' > 999995
    }
    annamt veh_lse_`nnn' 2009
    keep persid veh_lse_`nnn'_aa*
    samp 1
    process_out veh_lse_`nnn'_aa
    save pro_w_datasets/veh_lse_`nnn'_aa_w, replace
  }

  forvalues nnn = 1/3 {
    use raw_w_datasets/veh_pymt_`nnn'_w
    merge_in veh_pymtp_`nnn' raw
    forvalues yyyy = 1999(2)2009 {
        local yy = substr(string(`yyyy'), 3,2)
        rename veh_pymt_`nnn'`yy' veh_pymt_`nnn'_amt`yy'
        rename veh_pymtp_`nnn'`yy' veh_pymt_`nnn'_per`yy'
        capture replace veh_pymt_`nnn'_amt`yy' = . if veh_pymt_`nnn'_amt`yy' > 999995
    }
    annamt veh_pymt_`nnn' 2009
    keep persid veh_pymt_`nnn'_aa* 
    samp 1
    process_out veh_pymt_`nnn'_aa
    save pro_w_datasets/veh_pymt_`nnn'_aa_w, replace
  }

* mortgage information

* own home 

  use raw_w_datasets/hsval_w
  merge_in id raw

  forvalues yyyy = 1968/2011 {
      if `yyyy'==1998|`yyyy'==2000|`yyyy'==2002|`yyyy'==2004|`yyyy'==2006|`yyyy'==2008|`yyyy'==2010 {
        display "No PSID in `yyyy'"
        continue
      }
      local yy = substr(string(`yyyy'), 3,2)
      capture gen homeown`yy' = (1 - (hsval`yy' == 0)) if id`yy'~=0
      capture replace homeown`yy' = 0 if hsval`yy' == . & id`yy'~=0
      capture replace homeown`yy' = . if (hsval`yy' == 8 | hsval`yy' == 9)
  }
  keep persid homeown*
  samp 1
  process_out homeown
  save pro_w_datasets/homeown_w, replace

* have mortgage 

  forvalues nnn = 1/2 {
    use raw_w_datasets/mort_yes_`nnn'_w 
    merge_in id raw  
    forvalues yyyy = 1968/2011 {
      if `yyyy'==1998 | `yyyy'==2000 | `yyyy'==2002 | `yyyy'== 2004 | `yyyy'== 2006 | `yyyy'==2008 {
        display "No PSID in `yyyy'"
        continue
      }
      local yy = substr(string(`yyyy'), 3,2)
      capture gen tempvar = (mort_yes_`nnn'`yy'==1 | mort_yes_`nnn'`yy' ==2) if id`yy'~=0  
      capture replace tempvar = . if (mort_yes_`nnn'`yy' == 8 | mort_yes_`nnn'`yy' == 9)
      capture replace mort_yes_`nnn'`yy' = tempvar
      capture drop tempvar
  }
  keep persid mort_yes_`nnn'*
  samp 1
  process_out mort_yes_`nnn'
  save pro_w_datasets/mort_yes_`nnn'_w, replace
  }

* other mortgage variables:  for these, I just need to make them consistent over the years
* type = mortgage, home equity loan, or what
* orig = original (1) or refinanced (2)
* int_pp = interest rate (whole number)
* fxd = fixed (1) or variable (2) interest rate
* yr = year mortgage obtained
* trm = years left to pay

foreach vvv in typ orig bal_acc pymt_acc int_pp int_frc fxd yr trm {
  forvalues nnn = 1/2 {
      use raw_w_datasets/mort_`vvv'_`nnn'_w
      merge_in id raw
      forvalues yyyy = 1979/2011 {
        if `yyyy'==1998|`yyyy'==2000|`yyyy'==2002|`yyyy'==2004|`yyyy'==2006|`yyyy'==2008|`yyyy'==2010 {
          display "No PSID in `yyyy'"
          continue
        }
        local yy = substr(string(`yyyy'), 3,2)
        capture replace mort_`vvv'_`nnn'`yy' = 0 if mort_`vvv'_`nnn'`yy' == . & id`yy'~=0
        if inlist("`vvv'", "typ","orig","bal_acc","pymt_acc","fxd") {
	    capture replace mort_`vvv'_`nnn'`yy' = . if mort_`vvv'_`nnn'`yy' == 8 | mort_`vvv'_`nnn'`yy' == 9      		
	  }
        if inlist("`vvv'", "int_pp", "trm") {
	    capture replace mort_`vvv'_`nnn'`yy' = . if mort_`vvv'_`nnn'`yy' > 95    		
	  }
       if inlist("`vvv'", "int_frc") {
	    capture replace mort_`vvv'_`nnn'`yy' = . if mort_`vvv'_`nnn'`yy' > 997    		
	  }
       if inlist("`vvv'", "yr") {
	    capture replace mort_`vvv'_`nnn'`yy' = . if mort_`vvv'_`nnn'`yy' > 9995    		
	  }
    }
  keep persid mort_`vvv'_`nnn'*
  samp 1
  process_out mort_`vvv'_`nnn'
  save pro_w_datasets/mort_`vvv'_`nnn'_w, replace
  }
}

* mortgage distress variables

foreach vvv in mort_beh_yes mort_beh_mo mort_fc_yes mort_fc_mnth mort_fc_yr ///
  mort_mod mort_dist_prob {
    forvalues nnn = 1/2 {
      use raw_w_datasets/`vvv'_`nnn'_w   
      merge_in id raw
      forvalues yyyy = 2009(2)2011 {
        local yy = substr(string(`yyyy'), 3,2)
        capture replace `vvv'_`nnn'`yy' = 0 if `vvv'_`nnn'`yy' == . & id`yy'~=0
        if inlist("`vvv'", "mort_beh_yes","mort_fc_yes","mort_mod","mort_dist_prob") { 
          capture replace `vvv'_`nnn'`yy' = . if `vvv'_`nnn'`yy' == 8 | `vvv'_`nnn'`yy' == 9 
        }     
        if inlist("`vvv'", "mort_beh_mo","mort_fc_mnth") { 
          capture replace `vvv'_`nnn'`yy' = . if `vvv'_`nnn'`yy' >95
         } 
        if inlist("`vvv'", "mort_beh_yr") { 
          capture replace `vvv'_`nnn'`yy' = . if `vvv'_`nnn'`yy' >9995
        } 
      }        		
  keep persid `vvv'_`nnn'*
  samp 1
  process_out `vvv'_`nnn'
  save pro_w_datasets/`vvv'_`nnn'_w, replace
  }
}

foreach vvv in fc_st_yes fc_st_mo fc_st_yr fc_end_yes fc_losthm fc_st_amt {
  use raw_w_datasets/`vvv'_w   
  merge_in id raw
  forvalues yyyy = 2009(2)2011 {
    local yy = substr(string(`yyyy'), 3,2)
    capture replace `vvv'`yy' = 0 if `vvv'`yy' == . & id`yy'~=0
    if inlist("`vvv'", "fc_st_yes","fc_end_yes","fc_lost_hm") { 
      capture replace `vvv'`yy' = . if `vvv'`yy' == 8 | `vvv'`yy' == 9   
    }   
    if inlist("`vvv'", "fc_st_mo") { 
      capture replace `vvv'`yy' = . if `vvv'`yy' > 95   
    }   
    if inlist("`vvv'", "fc_st_yr") { 
      capture replace `vvv'`yy' = . if `vvv'`yy' > 9995   
    }   
    if inlist("`vvv'", "fc_st_amt") { 
      capture replace `vvv'`yy' = . if `vvv'`yy' > 9999995   
    }
  }   		
  keep persid `vvv'*
  samp 1
  process_out `vvv'
  save pro_w_datasets/`vvv'_w, replace
  }

* mortgage interest rate

  forvalues nnn = 1/2 {
    use pro_w_datasets/mort_int_pp_`nnn'_w
    merge_in mort_int_frc_`nnn' pro
    merge_in id raw
    forvalues yyyy = 1996/2011 {
      if `yyyy'==1998|`yyyy'==2000|`yyyy'==2002|`yyyy'==2004|`yyyy'==2006|`yyyy'==2008|`yyyy'==2010 {
        display "No PSID in `yyyy'"
        continue
      }
      local yy = substr(string(`yyyy'), 3,2)
      capture gen mort_int_`nnn'`yy' = mort_int_pp_`nnn'`yy' + (mort_int_frc_`nnn'`yy'/100) if id`yy'~=0
    }
    keep persid mort_int_`nnn'*
    samp 1
    process_out mort_int_`nnn'
    save pro_w_datasets/mort_int_`nnn'_w, replace
  }

* mortgage payment

  forvalues nnn = 1/2 {
      use raw_w_datasets/mort_pymt_`nnn'_w   
      merge_in id raw
      forvalues yyyy = 1993/2011 {
        if `yyyy'==1998|`yyyy'==2000|`yyyy'==2002|`yyyy'==2004|`yyyy'==2006|`yyyy'==2008|`yyyy'==2010 {
          display "No PSID in `yyyy'"
          continue
        }
        local yy = substr(string(`yyyy'), 3,2)
        capture replace mort_pymt_`nnn'`yy' = 0 if mort_pymt_`nnn'`yy' == . & id`yy'~=0
        capture replace mort_pymt_`nnn'`yy' = . if mort_pymt_`nnn'`yy' > 99995
      }
  keep persid mort_pymt_`nnn'*
  samp 1
  process_out mort_pymt_`nnn'
  save pro_w_datasets/mort_pymt_`nnn'_w, replace
  }

* mortgage balance

  forvalues nnn = 1/2 {
      use raw_w_datasets/mort_bal_amt_`nnn'_w  
      merge_in id raw 
      forvalues yyyy = 1968/1981{
        local yy = substr(string(`yyyy'), 3,2)
        capture replace mort_bal_amt_`nnn'`yy' = 0 if mort_bal_amt_`nnn'`yy' == . & id`yy'~=0
        capture replace mort_bal_amt_`nnn'`yy' = . if mort_bal_amt_`nnn'`yy' >99995
      }
      forvalues yyyy = 1982/1993{
        local yy = substr(string(`yyyy'), 3,2)
        capture replace mort_bal_amt_`nnn'`yy' = 0 if mort_bal_amt_`nnn'`yy' == . & id`yy'~=0
        capture replace mort_bal_amt_`nnn'`yy' = . if mort_bal_amt_`nnn'`yy' >999995
      }
      forvalues yyyy = 1994/2011 {
        if `yyyy'==1998|`yyyy'==2000|`yyyy'==2002|`yyyy'==2004|`yyyy'==2006|`yyyy'==2008|`yyyy'==2010 {
          display "No PSID in `yyyy'"
          continue
        }
        local yy = substr(string(`yyyy'), 3,2)
        capture replace mort_bal_amt_`nnn'`yy' = 0 if mort_bal_amt_`nnn'`yy' == . & id`yy'~=0
        capture replace mort_bal_amt_`nnn'`yy' = . if mort_bal_amt_`nnn'`yy' >9999995
      }
  keep persid mort_bal_amt_`nnn'*
  samp 1
  process_out mort_bal_amt_`nnn'
  save pro_w_datasets/mort_bal_amt_`nnn'_w, replace
  }

* wealth variables:  get rid of the NAs, DKs, and topcoded variables

foreach vvv in ira_yes ira_comp {
  use raw_w_datasets/`vvv'_w   
  forvalues yyyy = 1979/2011 {
    local yy = substr(string(`yyyy'), 3,2)
    capture replace `vvv'`yy' = . if `vvv'`yy' == 8 | `vvv'`yy' == 9
  }
  keep persid `vvv'*
  samp 1
  process_out `vvv'
  save pro_w_datasets/`vvv'_w, replace
}    

foreach vvv in sechome_eqty veh_eqty bus_eqty stocks ira_amt chksav chksavira bonds  ///
  adds_alts debt_out debt_in {
  use raw_w_datasets/`vvv'_w 
  merge_in id raw 
  capture replace `vvv'84 = . if `vvv'84 > 9999995
  if inlist("`vvv'","veh_eqty","oth_debt") {
    capture replace `vvv'84 = . if `vvv'84 > 999995
  } 
  forvalues yyyy = 1989/1994 {
    local yy = substr(string(`yyyy'), 3,2)
    capture replace `vvv'`yy' = 0 if `vvv'`yy' == . & id`yy'~=0
    capture replace `vvv'`yy' = . if `vvv'`yy' > 9999995
    capture replace `vvv'`yy' = . if `vvv'`yy' == 8888888
  }
  forvalues yyyy = 1999/2011 {
    local yy = substr(string(`yyyy'), 3,2)
    capture replace `vvv'`yy' = 0 if `vvv'`yy' == . & id`yy'~=0
    capture replace `vvv'`yy' = . if `vvv'`yy' > 999999995
    capture replace `vvv'`yy' = . if `vvv'`yy' < -99999995
  }
  keep persid `vvv'*
  samp 1
  process_out `vvv'
  save pro_w_datasets/`vvv'_w, replace
}

* "Other" debt

  use raw_w_datasets/oth_dbt_w 
  merge_in cc_dbt raw
  merge_in stud_dbt raw
  merge_in med_dbt raw
  merge_in leg_dbt raw
  merge_in rel_dbt raw
  merge_in id raw 

  capture replace oth_dbt84 = . if oth_dbt84 > 999995
  forvalues yyyy = 1989/1994 {
    local yy = substr(string(`yyyy'), 3,2)
    capture replace oth_dbt`yy' = 0 if oth_dbt`yy' == . & id`yy'~=0
    capture replace oth_dbt`yy' = . if oth_dbt`yy' > 9999995
  }
  forvalues yyyy = 1999/2011 {
    local yy = substr(string(`yyyy'), 3,2)
    capture replace oth_dbt`yy' = 0 if oth_dbt`yy' == . & id`yy'~=0
    capture replace oth_dbt`yy' = . if oth_dbt`yy' > 999999995
    capture replace oth_dbt`yy' = . if oth_dbt`yy' < -99999995
    foreach vvv in cc stud med leg rel {
      capture replace `vvv'_dbt`yy' = . if `vvv'_dbt`yy' > 9999995
    }
  }
  gen oth_dbt11 = cc_dbt11 + stud_dbt11 + med_dbt11 + leg_dbt11 + rel_dbt11 
  keep persid oth_dbt*
  samp 1
  process_out oth_dbt
  save pro_w_datasets/oth_dbt_w, replace

* Have other debt?

  use raw_w_datasets/oth_dbt_w 
  merge_in cc_dbt raw
  merge_in stud_dbt raw
  merge_in med_dbt raw
  merge_in leg_dbt raw
  merge_in rel_dbt raw
  merge_in id raw 
  forvalues yyyy = 1984/2009 {
    local yy = substr(string(`yyyy'), 3,2)
    capture gen oth_dbt_yes`yy' = oth_dbt`yy' > 0 & oth_dbt`yy' ~=. if id`yy'~=0
  }
  capture gen oth_dbt_yes11 = 0 if id11~=0 
  foreach vvv in cc stud med leg rel {
      capture replace oth_dbt_yes11 = 1 if `vvv'_dbt11 > 0 & `vvv'_dbt11 ~=.
  }
  keep persid oth_dbt_yes*
  samp 1
  process_out oth_dbt_yes
  save pro_w_datasets/oth_dbt_yes_w, replace


* stocks in 2007 have a couple of "wild codes" that I need to remove
* only this year has the problem

  use pro_w_datasets/stocks_w
  replace stocks07 = . if stocks07 < 0
  save pro_w_datasets/stocks_w, replace

* moved

 use raw_w_datasets/moved_yr_w  
 merge_in id raw
 forvalues yyyy = 1993/2011 {
    local yy = substr(string(`yyyy'), 3,2)
    capture replace moved_yr`yy' = 0 if moved_yr`yy' == . & id`yy'~=0
    capture replace moved_yr`yy' = . if moved_yr`yy' >= 8 
  }
 forvalues yyyy = 2003/2011 {
    local yy = substr(string(`yyyy'), 3,2)
    capture replace moved_yr`yy' = 0 if moved_yr`yy' == . & id`yy'~=0
    capture replace moved_yr`yy' = . if moved_yr`yy' > 9995 
  }
  keep persid moved_yr*
  samp 1
  process_out moved_yr
  save pro_w_datasets/moved_yr_w, replace

* moved why:  this variable goes back to 1969 but I'm not using it prior to 75
* because the coding changes

use raw_w_datasets/moved_why_w
merge_in id raw
forvalues yyyy = 1975/2011 {
  local yy = substr(string(`yyyy'), 3,2)
  capture replace moved_why`yy' = 0 if moved_why`yy' == . & id`yy'~=0
  capture replace moved_why`yy' = . if moved_why`yy' >= 9
}
keep persid moved_why*
samp 1
process_out moved_why
save pro_w_datasets/moved_why_w, replace

*===============================================================================
*
* New stuff for BPEA project.  None of this stuff is in John's data sets
*

* state

  use raw_w_datasets/state_w
  merge_in id raw

 * picked up the wrong variables for 2001 (the FIPS version)
 * could re-extract but that might drive John crazy, so I'm just recoding
 * also note that in 97 there's no code for immigrant families (=0) and
 * in 99 on there's no code for Latino's; not sure how to fix
 
 recode state01 ///
         (1      =       1) ///
         (2      =       50) ///
         (4      =       2) ///
         (5      =       3) ///
         (6      =       4) ///
         (8      =       5) ///
         (9      =       6) ///
         (10     =       7) ///
         (11     =       8) ///
         (12     =       9) ///
         (13     =       10) ///
         (15     =       51) ///
         (16     =       11) ///
         (17     =       12) ///
         (18     =       13) ///
         (19     =       14) ///
         (20     =       15) ///
         (21     =       16) ///
         (22     =       17) ///
         (23     =       18) ///
         (24     =       19) ///
         (25     =       20) ///
         (26     =       21) ///
         (27     =       22) ///
         (28     =       23) ///
         (29     =       24) ///
         (30     =       25) ///
         (31     =       26) ///
         (32     =       27) ///
         (33     =       28) ///
         (34     =       29) ///
         (35     =       30) ///
         (36     =       31) ///
         (37     =       32) ///
         (38     =       33) ///
         (39     =       34) ///
         (40     =       35) ///
         (41     =       36) ///
         (42     =       37) ///
         (44     =       38) ///
         (45     =       39) ///
         (46     =       40) ///
         (47     =       41) ///
         (48     =       42) ///
         (49     =       43) ///
         (50     =       44) ///
         (51     =       45) ///
         (53     =       46) ///
         (54     =       47) ///
         (55     =       48) ///
         (56     =       49) 
  forvalues yyyy = 1994/2009 {
     local yy = substr(string(`yyyy'), 3,2)
     capture replace state`yy' = 0 if state`yy' == . & id`yy'~=0
     capture replace state`yy' = . if state`yy' == 99
   }

  keep persid state*
  samp 1
  process_out state
  save pro_w_datasets/state_w, replace

* mortgage variables
*
* mort_bal_amt = total mortgage balances
* mort_pymt    = total mortgage payment

  foreach vvv in bal_amt pymt {
    use pro_w_datasets/mort_`vvv'_1_w
    merge_in mort_`vvv'_2 pro
    forvalues yyyy = 1994/2011 {
      local yy = substr(string(`yyyy'), 3,2)
      capture gen mort_`vvv'`yy' = mort_`vvv'_1`yy' + mort_`vvv'_2`yy' 
    }
    drop mort_`vvv'_*
    keep persid mort_`vvv'*
    samp 1
    process_out mort_`vvv'
    save pro_w_datasets/mort_`vvv'_w, replace
  }

* Has mortgage 1 or 2 been refinanced?

  forvalues nnn = 1/2 {
    use raw_w_datasets/mort_orig_`nnn'_w 
    merge_in id raw  
    forvalues yyyy = 1996/2011 {
      local yy = substr(string(`yyyy'), 3,2)
      capture gen mort_refid_`nnn'`yy' = 0 if id`yy'~=0 
      capture replace mort_refid_`nnn'`yy' = 1 if mort_orig_`nnn'`yy' == 2 
      capture replace mort_refid_`nnn'`yy' = . if mort_orig_`nnn'`yy' == 8 | mort_orig_`nnn'`yy' == 9
  }
  keep persid mort_refid_`nnn'*
  samp 1
  process_out mort_refid_`nnn'
  save pro_w_datasets/mort_refid_`nnn'_w, replace
  }

* Has any mortgage been refinanced?

  use pro_w_datasets/mort_refid_1_w 
  merge_in mort_refid_2 pro
  merge_in id raw
  forvalues yyyy = 1996/2011 {
    local yy = substr(string(`yyyy'), 3,2)
    capture gen mort_refid`yy' = (mort_refid_1`yy'==1|mort_refid_2`yy'==1) if id`yy'~=0 
  }
  drop mort_refid_*
  keep persid mort_refid*
  samp 1
  process_out mort_refid
  save pro_w_datasets/mort_refid_w, replace

* Vehicle loan remaining balance
*
* This is a truly horrendous calculation --- see data_notes.txt for an explanation of what's going
* on here.

forvalues nnn = 1/3 {

  use pro_w_datasets/veh_bora_`nnn'_w
  merge_in veh_pymt_`nnn' pro
  merge_in veh_pymtm_`nnn' pro
  merge_in veh_pymtp_`nnn' raw
  merge_in veh_yra_`nnn' raw
  merge_in veh_trm_`nnn' pro
  merge_in id raw

    forvalues yyyy = 1999(2)2009 {
       local yy = substr(string(`yyyy'), 3,2)
       capture replace veh_yra_`nnn'`yy' = 0 if veh_yra_`nnn'`yy' == . & id`yy'~=0
       capture replace veh_yra_`nnn'`yy' = . if veh_yra_`nnn'`yy' > 9995

       * first priority is to figure out how long the loan will last (in months)
         capture gen loanlife`yy' = 0 if id`yy'~=0
         capture replace loanlife`yy' = (12/52)*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==3) ///
                              + (12/26)*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==4) ///
                              + (12/12)*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==5) ///
                              + (12/1 )*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==6)
         capture replace loanlife`yy' = . if veh_trm_`nnn'`yy' >=7 & veh_trm_`nnn'`yy' <= 9
         
       * now copy in t-2 info if they report 0 now but had a loan and the life was long
       * enough such that it should still be around in t but > 24 months
         if `yyyy' > 1999 {
            local l2yyyy = `yyyy' - 2
            local l2yy = substr(string(`l2yyyy'), 3,2)
            capture gen ffill = (veh_bora_`nnn'`yy' == 0 ///
                      &  veh_bora_`nnn'`l2yy' > 0 ///
                      & veh_bora_`nnn'`l2yy' ~= . ///
                      & loanlife`l2yy' >= 24 ///
                      & (`yyyy' - veh_yra_`nnn'`l2yy') < (loanlife`l2yy'/12))
            capture replace veh_bora_`nnn'`yy' = veh_bora_`nnn'`l2yy' if ffill==1
            capture replace veh_yra_`nnn'`yy'  = veh_yra_`nnn'`l2yy' if ffill==1
            capture replace loanlife`yy'   = loanlife`l2yy' if ffill==1
            drop ffill
            }

       * interest rate based on info from the G.19 release:  average of bank, fc new, fc used
         capture gen intrate`yy' = 0 if id`yy'~=0  
         capture replace intrate`yy' = 0 ///
		+ (veh_yra_`nnn'`yy'==1997) * 9.99/1200  /// 
		+ (veh_yra_`nnn'`yy'==1998) * 9.32/1200  /// 
		+ (veh_yra_`nnn'`yy'==1999) * 9.26/1200  /// 
		+ (veh_yra_`nnn'`yy'==2000) * 9.89/1200  /// 
		+ (veh_yra_`nnn'`yy'==2001) * 8.90/1200  /// 
		+ (veh_yra_`nnn'`yy'==2002) * 7.76/1200  /// 
		+ (veh_yra_`nnn'`yy'==2003) * 6.87/1200  /// 
		+ (veh_yra_`nnn'`yy'==2004) * 6.78/1200  /// 
		+ (veh_yra_`nnn'`yy'==2005) * 7.30/1200  /// 
		+ (veh_yra_`nnn'`yy'==2006) * 7.44/1200  /// 
		+ (veh_yra_`nnn'`yy'==2007) * 7.29/1200  /// 
		+ (veh_yra_`nnn'`yy'==2008) * 7.09/1200  /// 
		+ (veh_yra_`nnn'`yy'==2009) * 6.65/1200  /// 
		+ (veh_yra_`nnn'`yy'==2010) * 6.21/1200 if id`yy'~=0

      * estimate payments made if they were paying monthly (the pymtm variable
      * corresponds to a variety of frequencies, including "other", so we can't
      * use it. 

        capture gen pymts_made`yy' = 0 if id`yy'~=0
        capture replace pymts_made`yy' = 12*(`yyyy' - veh_yra_`nnn'`yy') if veh_yra_`nnn'`yy' ~= 0
        capture replace pymts_made`yy' = . if pymts_made`yy' > loanlife`yy'

      gen veh_bal_amt_`nnn'`yy' = 0 if id`yy'~=0
      capture replace veh_bal_amt_`nnn'`yy' = veh_bora_`nnn'`yy' * (1 - ((1 + intrate`yy')^pymts_made`yy' - 1) /  ///
                      ((1 + intrate`yy')^loanlife`yy' - 1)) if veh_bora_`nnn'`yy' > 0  
    }
    drop veh_bal_amt_`nnn'99 veh_bal_amt_`nnn'01
    keep persid veh_bal_amt_`nnn'*
    samp 1
    process_out veh_bal_amt_`nnn'
    save pro_w_datasets/veh_bal_amt_`nnn'_w, replace
}

* Have vehicle loan?
*
* Can't simply use the veh_bal variables above because I excluded DKs and NAs, so
* I essentially have to repeat the calculation with the raw data --- UGH!
*
* Still have to exclude those with NAs for term and year acquired because we 
* can only use valid values to see if the loan is still outstanding

forvalues nnn = 1/3 {

  use raw_w_datasets/veh_bora_`nnn'_w
  merge_in veh_pymt_`nnn' raw
  merge_in veh_pymtm_`nnn' raw
  merge_in veh_pymtp_`nnn' raw
  merge_in veh_yra_`nnn' raw
  merge_in veh_trm_`nnn' pro
  merge_in id raw

    forvalues yyyy = 1999(2)2009 {
       local yy = substr(string(`yyyy'), 3,2)
     
       * Figure out how long the loan will last (in months)
         capture gen loanlife`yy' = 0 if id`yy'~=0
         capture replace loanlife`yy' = (12/52)*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==3) ///
                              + (12/26)*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==4) ///
                              + (12/12)*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==5) ///
                              + (12/1 )*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==6)
         capture replace loanlife`yy' = . if veh_trm_`nnn'`yy' >=7 & veh_trm_`nnn'`yy' <= 9
         
       * now copy in t-2 info if they report 0 now but had a loan and the life was long
       * enough such that it should still be around in t but > 24 months

         capture replace veh_yra_`nnn'`yy' = 0 if veh_yra_`nnn'`yy' == . & id`yy'~=0
         capture replace veh_yra_`nnn'`yy' = . if veh_yra_`nnn'`yy' > 9995
         if `yyyy' > 1999 {
            local l2yyyy = `yyyy' - 2
            local l2yy = substr(string(`l2yyyy'), 3,2)
            capture gen ffill = (veh_bora_`nnn'`yy' == 0 ///
                      &  veh_bora_`nnn'`l2yy' > 0 ///
                      & veh_bora_`nnn'`l2yy' ~= . ///
                      & loanlife`l2yy' >= 24 ///
                      & (`yyyy' - veh_yra_`nnn'`l2yy') < (loanlife`l2yy'/12))
            capture replace veh_bora_`nnn'`yy' = veh_bora_`nnn'`l2yy' if ffill==1
            capture replace veh_yra_`nnn'`yy'  = veh_yra_`nnn'`l2yy' if ffill==1
            capture replace loanlife`yy'   = loanlife`l2yy' if ffill==1
            drop ffill
            }
         gen veh_loan_yes_`nnn'`yy' = (veh_bora_`nnn'`yy' > 0 & veh_bora_`nnn'`yy' ~=.) if id`yy'~=0
    }
    keep persid veh_loan_yes_`nnn'*
    samp 1
    sort persid
    save temp`nnn', replace
}
    use temp1
    sort persid
    merge persid using temp2
    drop _merge
    sort persid
    merge persid using temp3
    drop _merge
    sort persid
    merge_in id raw
    forvalues yyyy = 2003(2)2009 {
      local yy = substr(string(`yyyy'), 3,2)
      gen veh_loan_yes`yy' = (veh_loan_yes_1`yy' + veh_loan_yes_2`yy' + veh_loan_yes_3`yy') > 0 if id`yy'~=0
    }
    drop veh_loan_yes_*
    keep persid veh_loan_yes*
    samp 1 
    process_out veh_loan_yes
    save pro_w_datasets/veh_loan_yes_w, replace

* Total $s owed on vehicles

  use pro_w_datasets/veh_bal_amt_1_w
  merge_in veh_bal_amt_2 pro
  merge_in veh_bal_amt_3 pro
  merge_in id raw
  forvalues yyyy = 2003(2)2009 {
     local yy = substr(string(`yyyy'), 3,2)
     capture gen veh_bal_amt`yy' = veh_bal_amt_1`yy' + veh_bal_amt_2`yy' + veh_bal_amt_3`yy' if id`yy'~=0
  } 
  drop veh_bal_amt_*
  keep persid veh_bal_amt*
  samp 1
  process_out veh_bal_amt
  save pro_w_datasets/veh_bal_amt_w, replace

* Vehicle loan payments 

forvalues nnn = 1/3 {

  use pro_w_datasets/veh_bora_`nnn'_w
  merge_in veh_pymt_`nnn' pro
  merge_in veh_pymtm_`nnn' pro
  merge_in veh_pymtp_`nnn' raw
  merge_in veh_yra_`nnn' raw
  merge_in veh_trm_`nnn' pro
  merge_in id raw

    forvalues yyyy = 1999(2)2009 {
       local yy = substr(string(`yyyy'), 3,2)
       capture replace veh_yra_`nnn'`yy' = 0 if veh_yra_`nnn'`yy' == . & id`yy'~=0
       capture replace veh_yra_`nnn'`yy' = . if veh_yra_`nnn'`yy' > 9995

       * first priority is to figure out how long the loan will last (in months)
         capture gen loanlife`yy' = 0 if id`yy'~=0
         capture replace loanlife`yy' = (12/52)*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==3) ///
                              + (12/26)*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==4) ///
                              + (12/12)*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==5) ///
                              + (12/1 )*veh_trm_`nnn'`yy'*(veh_pymtp_`nnn'`yy'==6)
         capture replace loanlife`yy' = . if veh_trm_`nnn'`yy' >=7 & veh_trm_`nnn'`yy' <= 9
         
       * now copy in t-2 info if they report 0 now but had a loan and the life was long
       * enough such that it should still be around in t but > 24 months
         if `yyyy' > 1999 {
            local l2yyyy = `yyyy' - 2
            local l2yy = substr(string(`l2yyyy'), 3,2)
            capture gen ffill = (veh_bora_`nnn'`yy' == 0 ///
                      &  veh_bora_`nnn'`l2yy' > 0 ///
                      & veh_bora_`nnn'`l2yy' ~= . ///
                      & loanlife`l2yy' >= 24 ///
                      & (`yyyy' - veh_yra_`nnn'`l2yy') < (loanlife`l2yy'/12))
            capture replace veh_pymt_`nnn'`yy' = veh_pymt_`nnn'`l2yy' if ffill==1
            capture replace veh_pymtp_`nnn'`yy'  = veh_pymtp_`nnn'`l2yy' if ffill==1
            drop ffill
            }

      * calculate payments
        capture gen veh_pymt_`nnn'_amt`yy'  = veh_pymt_`nnn'`yy' 
        capture gen veh_pymt_`nnn'_per`yy'  = veh_pymtp_`nnn'`yy'     
    }
    annamt veh_pymt_`nnn' 2009
    keep persid veh_pymt_`nnn'_aa*
    samp 1
    process_out veh_pymt_`nnn'_aa
    save pro_w_datasets/veh_pymt_`nnn'_aa_w, replace
}

* Total mortgage and vehicle payments per year

  use pro_w_datasets/veh_pymt_1_aa_w
  merge_in veh_pymt_2_aa pro
  merge_in veh_pymt_3_aa pro
  merge_in mort_pymt pro
  merge_in oth_dbt pro
  merge_in id raw

  forvalues yyyy = 1999(2)2009 {
     local yy = substr(string(`yyyy'), 3,2)
     gen mort_veh_pymt`yy' = veh_pymt_1_aa`yy' ///
                                       + veh_pymt_2_aa`yy' ///
                                       + veh_pymt_3_aa`yy' ///
                                       + 12*mort_pymt`yy' if id`yy'~=0

  } 
  keep persid mort_veh_pymt*
  samp 1
  process_out mort_veh_pymt
  save pro_w_datasets/mort_veh_pymt_w, replace

* Total debt service per year

  use pro_w_datasets/veh_pymt_1_aa_w
  merge_in veh_pymt_2_aa pro
  merge_in veh_pymt_3_aa pro
  merge_in mort_pymt pro
  merge_in oth_dbt pro
  merge_in id raw

  forvalues yyyy = 1999(2)2009 {
     local yy = substr(string(`yyyy'), 3,2)
      gen tot_dbt_pymt`yy' = veh_pymt_1_aa`yy' ///
                                  + veh_pymt_2_aa`yy' ///
                                  + veh_pymt_3_aa`yy' ///
                                  + 12*.025*oth_dbt`yy' ///
                                  + 12*mort_pymt`yy' if id`yy'~=0
  } 
  keep persid tot_dbt_pymt*
  samp 1
  process_out tot_dbt_pymt
  save pro_w_datasets/tot_dbt_pymt_w, replace

* Number of vehicle loans

  use pro_w_datasets/veh_bal_amt_1_w
  merge_in veh_bal_amt_2 pro
  merge_in veh_bal_amt_3 pro
  merge_in id raw
  forvalues yyyy = 2003(2)2009 {
     local yy = substr(string(`yyyy'), 3,2)
     capture gen num_veh_loans`yy' = (veh_bal_amt_1`yy'>0)*(veh_bal_amt_1`yy'~=.) ///
                                   + (veh_bal_amt_2`yy'>0)*(veh_bal_amt_2`yy'~=.) ///
                                   + (veh_bal_amt_3`yy'>0)*(veh_bal_amt_3`yy'~=.) ///
                                   if id`yy'~=0
  } 
  keep persid num_veh_loans*
  samp 1
  process_out num_veh_loans
  save pro_w_datasets/num_veh_loans_w, replace

* Total debt

  use pro_w_datasets/mort_bal_amt_w
  merge_in veh_bal_amt pro
  merge_in oth_dbt pro
  merge_in id raw

  forvalues yyyy = 2003(2)2009 {
     local yy = substr(string(`yyyy'), 3,2)
     capture gen tot_dbt`yy' = mort_bal_amt`yy' + veh_bal_amt`yy' + oth_dbt`yy' if id`yy'~=0
  } 

  keep persid tot_dbt*
  samp 1
  process_out tot_dbt
  save pro_w_datasets/tot_dbt_w, replace

* Financial assets

  use pro_w_datasets/stocks_w
  merge_in ira_amt pro
  merge_in chksav pro
  merge_in bonds pro
  merge_in id raw

  forvalues yyyy = 1999(2)2011 {
     local yy = substr(string(`yyyy'), 3,2)
     capture gen tot_fin`yy' = stocks`yy' ///
                             + ira_amt`yy' ///
                             + chksav`yy' ///
                             + bonds`yy' if id`yy'~=0
  } 

  keep persid tot_fin*
  samp 1
  process_out tot_fin
  save pro_w_datasets/tot_fin_w, replace

* Net worth

  use pro_w_datasets/tot_fin_w
  merge_in hsval pro
  merge_in mort_bal_amt pro
  merge_in sechome_eqty pro
  merge_in veh_eqty pro
  merge_in bus_eqty pro
  merge_in oth_dbt pro
  merge_in id raw
 
  forvalues yyyy = 1999(2)2011 {
     local yy = substr(string(`yyyy'), 3,2)
     gen networth`yy' = hsval`yy' - mort_bal_amt`yy' ///
                              + sechome_eqty`yy' ///
                              + veh_eqty`yy' ///
                              + bus_eqty`yy' ///
                              + tot_fin`yy' ///
                              - oth_dbt`yy' if id`yy'~=0
  } 

  keep persid networth*
  samp 1
  process_out networth
  save pro_w_datasets/networth_w, replace

* Consumption aggregates

* Cars:  this year and last year (haven't decided which to use ... this year isn't a full year 
*        but it is more timely)

  use pro_w_datasets/veh_prc_1_w
  merge_in veh_yra_1 raw
  merge_in veh_prc_2 pro
  merge_in veh_yra_2 raw
  merge_in veh_prc_3 pro
  merge_in veh_yra_3 raw
  merge_in id raw
  save tempdat, replace

  forvalues yyyy = 1999(2)2009 {
       local yy = substr(string(`yyyy'), 3,2)
       gen veh_curryr`yy' = veh_prc_1`yy' * (veh_yra_1`yy'==`yyyy') ///
                          + veh_prc_2`yy' * (veh_yra_2`yy'==`yyyy') ///
                          + veh_prc_3`yy' * (veh_yra_3`yy'==`yyyy') if id`yy'~=0
  }

  keep persid veh_curryr*
  samp 1
  process_out veh_curryr
  save pro_w_datasets/veh_curryr_w, replace

  use tempdat
  forvalues yyyy = 1999(2)2009 {
       local lyyyy = `yyyy' - 1
       local yy    = substr(string(`yyyy'), 3,2)
       gen veh_prevyr`yy' = veh_prc_1`yy' * (veh_yra_1`yy'==`lyyyy') ///
                          + veh_prc_2`yy' * (veh_yra_2`yy'==`lyyyy') ///
                          + veh_prc_3`yy' * (veh_yra_3`yy'==`lyyyy') if id`yy'~=0
  }

  keep persid veh_prevyr*
  samp 1
  process_out veh_prevyr
  save pro_w_datasets/veh_prevyr_w, replace

* vehicle related (omitted car insurance because there are a lot of mv's for this category)

  use pro_w_datasets/veh_lse_1_aa_w
  merge_in veh_lse_1_aa pro
  merge_in veh_lse_2_aa pro
  merge_in veh_lse_3_aa pro
  merge_in add_lse_aa pro
  merge_in car_rep_aa pro
  merge_in park_aa pro
  merge_in id raw

  forvalues yyyy = 1999(2)2009 {
       local yy    = substr(string(`yyyy'), 3,2)
       gen veh_rel_aa`yy' = veh_lse_1_aa`yy' + veh_lse_2_aa`yy' + veh_lse_3_aa`yy' + add_lse_aa`yy' ///
                           + car_rep_aa`yy' ///
                           + park_aa`yy' if id`yy'~=0
  }

  keep persid veh_rel_aa*
  samp 1
  process_out veh_rel_aa
  save pro_w_datasets/veh_rel_aa_w, replace

* other transportation

  use pro_w_datasets/bus_aa_w
  merge_in cab_aa pro
  merge_in otrans_aa pro
  merge_in id raw

  forvalues yyyy = 1999(2)2009 {
       local yy    = substr(string(`yyyy'), 3,2)
       gen trans_xveh_aa`yy' = bus_aa`yy' ///
                             + cab_aa`yy' ///
                             + otrans_aa`yy' if id`yy'~=0
  }

  keep persid trans_xveh_aa*
  samp 1
  process_out trans_xveh_aa
  save pro_w_datasets/trans_xveh_aa_w, replace

* recreation

  use pro_w_datasets/trip_aa_w
  merge_in orec_aa pro
  merge_in id raw

  forvalues yyyy = 2005(2)2009 {
       local yy    = substr(string(`yyyy'), 3,2)
       gen rec_aa`yy' = trip_aa`yy' ///
                      + orec_aa`yy' if id`yy'~=0
  }
  keep persid rec_aa*
  samp 1
  process_out rec_aa
  save pro_w_datasets/rec_aa_w, replace

* education

  use pro_w_datasets/tuit_aa_w
  merge_in osch_aa pro
  merge_in id raw

  forvalues yyyy = 2005(2)2009 {
       local yy    = substr(string(`yyyy'), 3,2)
       gen educ_aa`yy' = tuit_aa`yy' ///
                       + osch_aa`yy' if id`yy'~=0
  }
  keep persid educ_aa*
  samp 1
  process_out educ_aa
  save pro_w_datasets/educ_aa_w, replace

* housing: imputed rent for homeowners = 0.6*house value

  use pro_w_datasets/rent_aa_w
  merge_in hsval pro
  merge_in hrep_aa pro
  merge_in hins_aa pro
  merge_in ptax_aa pro
  merge_in id raw

  forvalues yyyy = 2005(2)2009 {
       local yy    = substr(string(`yyyy'), 3,2)
       gen housing_aa`yy' = 0.06 * hsval`yy' ///
                          + rent_aa`yy' ///
                          + hrep_aa`yy' ///
                          + hins_aa`yy' ///
                          + ptax_aa`yy' if id`yy'~=0
  }
  keep persid housing_aa*
  samp 1
  process_out housing_aa
  save pro_w_datasets/housing_aa_w, replace

* total nonhousing consumption

  use pro_w_datasets/veh_curryr_w
  merge_in veh_rel_aa pro
  merge_in gas_aa pro
  merge_in trans_xveh_aa pro
  merge_in furn_aa pro
  merge_in cloth_aa pro
  merge_in furn_aa pro
  merge_in rec_aa pro
  merge_in educ_aa pro
  merge_in fd pro
  merge_in id raw

  forvalues yyyy = 2005(2)2009 {
       local yy    = substr(string(`yyyy'), 3,2)
       gen consxh_aa`yy' = veh_curryr`yy' ///
                    + veh_rel_aa`yy' ///
                    + gas_aa`yy' ///
                    + trans_xveh_aa`yy' ///
                    + furn_aa`yy' ///
                    + cloth_aa`yy' ///
                    + furn_aa`yy' ///
                    + rec_aa`yy' ///
                    + educ_aa`yy' ///
                    + fd`yy' if id`yy'~=0
  }
  keep persid consxh_aa*
  samp 1
  process_out consxh_aa
  save pro_w_datasets/consxh_aa_w, replace

* total consumption

  use pro_w_datasets/consxh_aa_w
  merge_in housing_aa pro
  merge_in id raw

  forvalues yyyy = 2005(2)2009 {
       local yy    = substr(string(`yyyy'), 3,2)
       gen cons_aa`yy' = consxh_aa`yy' + housing_aa`yy' if id`yy' ~= 0
  }
  keep persid cons_aa*
  samp 1
  process_out cons_aa
  save pro_w_datasets/cons_aa_w, replace

* number of cars

  use raw_w_datasets/nocars_w
  merge_in id raw
  forvalues yyyy = 1968/2009 {
       local yy    = substr(string(`yyyy'), 3,2)    
       capture replace nocars`yy' = 0 if nocars==. & id`yy'~=0   
       capture replace nocars`yy' = . if nocars`yy' > 10
  }
  keep persid nocars*
  samp 1
  process_out nocars
  save pro_w_datasets/nocars_w, replace

* moved

 use raw_w_datasets/moved_w  
 merge_in id raw
 forvalues yyyy = 1969/2011 {
    local yy = substr(string(`yyyy'), 3,2)
    capture gen moved_yes`yy' = 0 if id`yy'~=0
    capture replace moved_yes`yy' = 1 if moved`yy'==1
    capture replace moved_yes`yy' = . if moved`yy'>=8
 }
  keep persid moved_yes*
  samp 1
  process_out moved_yes
  save pro_w_datasets/moved_yes_w, replace
  erase temp1.dta
  erase temp2.dta
  erase temp3.dta
  erase tempdat.dta

*end



